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Abstract 

Using mobile robots for autonomous patrolling of environments to prevent 
intrusions is a topic of increasing practical relevance. One of the most chal- 
lenging scientific issues is the problem of finding effective patrolling strategies 
that, at each time point, determine the next moves of the patrollers in order 
to maximize some objective function. In the very last years this problem 
has been addressed in a game theoretical fashion, explicitly considering the 
presence of an adversarial intruder. The general idea is that of modeling a 
patrolling situation as a game, played by the patrollers and the intruder, and 
of studying the equilibria of this game to derive effective patrolling strate- 
gies. In this paper we present a game theoretical formal framework for the 
determination of effective patrolling strategies that extends the previous pro- 
posals appeared in the literature, by considering environments with arbitrary 
topology and arbitrary preferences for the agents. The main original contri- 
butions of this paper are the formulation of the patrolling game for generic 
graph environments, an algorithm for finding a deterministic equilibrium 
strategy, which is a fixed path through the vertices of the graph, and an 
algorithm for finding a non-deterministic equilibrium strategy, which is a set 
of probabilities for moving between adjacent vertices of the graph. Both the 
algorithms are analytically studied and experimentally validated, to assess 
their properties and efficiency. 
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multiagent systems. 



1. Introduction 

A patrolling situation is characterized by one or more patrollers and by 
some targets, located in an environment, that have to be patrolled in order to 
prevent the entering of a possible intruder. Patrollers move in the environ- 
ment according to a patrolling strategy that, at each time point, determines 
their next action. One of the most interesting scientific issues is the deter- 
mination of effective patrolling strategies for autonomous robotic patrollers, 
namely, of strategies that drive the robots around environments to maximize 
some objective function. Several objective functions can be defined according 
to the specific patrolling situation, and works in literature can be approxi- 
mately divided into two main classes. In the first one, the presence of the 
adversary, i.e., the intruder, is not taken into account and the problem of 
determining the optimal patrolling strategy comes down to the problem of 
exploring or covering the environment. This problem can be formulated as 
a classic decision theory problem where the objective function can be, for 
instance, a coverage index or an entropy-based metric [H, 0, 0, 0, 0|. In the 
second class of works, the presence of an adversary is considered and the 
problem is addressed in a game theoretical fashion p, 0, @, B]- In this case, 
the patrollers' objective function to be maximized is their expected utility (in 



the sense of von Neumann and Morgenstern [10j) computed over the game 
outcomes. The intruder is usually modeled as a rational agent that observes 
the strategy of the patrollers and acts to maximize its expected utility. That 
is, the intruder is assumed to be as strong as possible. Roughly speaking, 
the contributions belonging to the first class come mostly from the mobile 
robotic community, while the contributions belonging to the second class 
come mostly from the multiagent community. Recently, it has been shown 
that considering a model of the adversary can give the patrolling robots a 
larger expected utility than the case in which the opponent is not modeled [6[ . 

In this paper we present a formal framework for the determination of ef- 
fective patrolling strategies that belongs to the class of game theoretical ap- 
proaches. The proposed approach is based on the general idea of modeling a 
patrolling situation as a game, played by the patrollers and the intruder, and 



of studying the equilibria [10[] of this game to derive the optimal patrolling 
strategy. Now we survey the main related works, to set the background for 
introducing in more detail our original contributions. 
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1.1. Game Theoretical Patrolling Models for Mobile Robots 

Game theoretical approaches model a patrolling situation as a non-cooperative 



game [lOf] between the patrollers and the intruder. Two are the main ap- 
proaches proposed in the literature for robotic patrolling with adversaries: 
one does not explicitly model the preferences of the adversaries 0], whereas 
the other one does [a, 0] . Before reviewing these approaches, we note that 
similar strategic problems have been addressed in the pursuit-evasion field 



(e.g., [111. Il2l|) and by hide-and-seek approaches [13j, where a hider can hide 
itself in a vertex of an arbitrary graph and a seeker can move along the graph 
to seek the hider within a finite time. However, some assumptions, including 
the fact that the evader's goal is only to avoid capture and not to enter an 
area of interest, make the pursuit-evasion problem not directly comparable 
with the patrolling problem we are considering. 

We now describe the main approach that does not model the intruder pref- 
erences. In [7], the authors consider the problem of patrolling a perimeter 
divided in cells, each one giving access to an area of interest, by employing a 
team of synchronized mobile robots acting in turns. The perimeter is consid- 
ered as a ring whose cells require the same time, say d turns, to the intruder 
for entering. The robots keep an evenly separated formation by moving in a 
coordinated fashion. (The authors show that this patrolling configuration is 
optimal for their settings.) The patrolling strategy does not depend on the 
specific cells in which the robots are. The authors present different movement 
models for the robots. In the simplest one, all the robots move clockwise with 
probability p or move counterclockwise with probability 1 — p. In the most 
realistic movement model, all the patrollers move to the cells they are headed 
to with probability p and reverse their heading, staying in their current cells 
for a turn, with probability 1 — p. The intruder is assumed to be in the 
position to repeatedly observe the actions of the patroller (staying hidden), 
derive a correct belief over the patroller's strategy, and attack the cell for 
which the probability to be captured is the smallest one (this is because no 
preferences over the cells are considered). The optimal patrolling strategy 
amounts to choose the value of p that maximizes the minimum expected 
utility for the patrollers or, equivalently, that max-minimizes the detection 



probability. Two interesting extensions to this work are worth citing. In [14 
the authors study the impact of the intruder's knowledge on the performance 
of the patrolling strategy (e.g., when the intruder has zero or partial knowl- 
edge over the patroller's strategy). In 15| . the authors study the impact of 
uncertainty over the sensed data. These works present two main limitations. 
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First, they are applicable only to very special ring-like environments where 
all the cells have the same penetration time and the patrollers have no pref- 
erences over the cells. Second, the strategy they produce is optimal only 
when the intruder has no preferences over the cells. The work we present 
in this paper overcomes these limitations by considering arbitrary topologies 
and preferences for patroller and intruder. 

We now turn to describe the main approach that explicitly models the 
intruder's preferences. In (9)], the authors deal with the problem of patrolling 
n areas by using a single robotic patroller such that the number of turns it 
would spend to patrol all the areas is strictly larger than time d needed by 
the intruder to enter an area. They model such a problem as a two-player 
(i.e., the patroller and the intruder) strategic- form game with incomplete in- 
formation (i.e., the intruder's preferences over the areas can be uncertain to 
the patroller) [lOj. The actions available to the patroller are all the possible 
routes of n areas, while the intruder chooses a single area to enter. The 
intruder is assumed to be in the position to repeatedly observe the actions 
of the patroller (staying hidden), derive a correct belief over the patroller's 
strategy, and find its best response to the patroller's strategy. The appro- 
priate equilibrium concept, in which the patroller maximizes its expected 
utility, is the leader-follower equilibrium lq . (A slight variation of this ap- 



proach has been applied to the problem of patrolling n access points with 
m < n static checkpoints at the Los Angeles International Airport [13].) As 
discussed in [lij], the approach in 0] presents two drawbacks. First, no topol- 
ogy connecting the areas is considered and therefore it can hardly be applied 
to real-world patrolling settings, where areas are usually connected by intri- 
cate topologies and the patroller cannot move between any two areas in a 
single turn. Second, if the decisions of the patroller are over the next route to 
patrol (instead of over the next area), the intruder can increase its expected 
utility by waiting, observing the patroller's actions, and then choosing the 
turn in which to enter. The work we present in this paper eliminates these 
drawbacks, by considering settings with arbitrary topologies and allowing 
the patroller to decide over the next area to patrol. 

1.2. Main Original Contributions 

We present an approach to robotic patrolling that is more general than 
the game theoretical approaches discussed above, since it deals with environ- 
ments with arbitrary topology and with arbitrary preferences for the agents. 
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In particular, the main original contributions of this paper can be summa- 
rized as follows. 



• We formulate the patrolling problem as a game played by the patroller 
and an intruder in an environment described as a graph. In this setting, 
the patrolling strategy amounts to determine the vertex the patroller 
should visit next. Assuming that the intruder can observe the patroller 
for an indefinitely long time, the optimal patrolling strategy is found 
by calculating the leader-follower equilibrium of the game. 

• We propose an algorithm to find a deterministic equilibrium strategy, 
which consists in a fixed path through the vertices of the graph such 
that, when the patroller follows it, attempting an intrusion is not the 
best action for a rational intruder. 

• We propose an algorithm to find a non-deterministic equilibrium strat- 
egy, which consists in a set of probabilities for moving between adjacent 
vertices of the graph such that, when the patroller follows it, its ex- 
pected utility is maximized. 

Both the algorithms are analytically studied and experimentally validated, 
to assess their properties and efficiency We explicitly note that some pre- 
liminary results about the algorithms for finding the deterministic and non 



deterministic equilibrium strategies have been reported in [19( and in |20l . 121 



22l . |23| ] , respectively. 



1.3. Structure of the Paper 

In the next section we formulate the problem and we overview the main 
results presented in the rest of the paper. In Sections [3] and 0] we present 
the algorithms for finding the deterministic and the non-deterministic equi- 
librium strategies, respectively. Both these sections have a similar structure, 
presenting the specific state of the art, introducing the proposed algorithms, 
analyzing them theoretically, and validating them experimentally. Section [5] 
discusses some extensions to our framework in order to capture more realistic 
aspects and to improve its efficiency Finally, Section [6] concludes the paper. 

2. Problem Formulation and Overview of the Main Results 

In this section, we formalize the patrolling setting we study (Section l2.ip . 
we present our game model (Section [2.21) . we discuss the appropriate solution 
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concept (Section I2.3p . and we summarize the main results we provide in this 
paper (Section 12 .4p . 

2.1. Patrolling Setting 

We study settings that are characterized by the following features: 

• the environment is represented by a directed graph (as in Q); 

• there is a single patrolling robot equipped with sensors (e.g., a camera) 
able to detect intruders (as in Si); 

• the intruder can perfectly observe the patroller's strategy before acting 
and derive a correct belief over it (as in 0, 0]); 

• time is discretized in turns (as in 0,0]); 

• the intruder enters in vertices and cannot do anything else for some 
turns once it has attempted to enter a vertex (this amounts to say that 
penetration takes some turns to be completed, as in 0, 0]); 

• thepatroller and the intruder are assumed to be rational agents (as 
in Si). 

The patrolling setting is described by a direct graph G = (V, A,T,v, d). 
V is a set of n vertices to be patrolled. A is the set of arcs connecting 
the vertices. We often represent A by a function a : V x V — > {0,1}, 
where a(i,j) = 1 means that there exists an arc directed from vertex i 
to vertex j and a(i,j) = means that there is not. Given a vertex i, a 
vertex j is adjacent to i if a(i,j) = 1. T C V contains the vertices that have 
some value for both the patroller and the intruder. We call these vertices 
targets. In practical applications, a target may represent an access point 
to an area with some value (e.g., a door, as in [7'j) or an area with some 
value (e.g., an house, as in i). Vertices that are not targets (in V \ T) are 
part of paths that the patroller traverses to move between targets, v is a 
pair of functions {v p ,Vi} where v p : T —>■ R assigns each target a value for 
the patroller and v- x : T —>■ R assigns each target a value for the intruder. 
Patroller and intruder can assign different values to the same target. The 
function d : T — > N \ {0} assigns each target a time interval (measured in 
turns) that the intruder must spend to successfully enter it. We call d(i) the 
intruder's penetration time for target i. We discuss in Section 5 how we can 
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deal with situations wherein the values of d( ■ ) and fi( • ) are uncertain. An 
example of a patrolling setting captured by our model is shown in Figure [TJ 
The bold numbers identify the vertices; arcs are depicted as arrows; the set 
of targets is T = {06, 08, 12, 14, 18}; the values reported in target i are d(i) 
and (v p (i), Vi(i)). The graph representation of an environment can be derived 
from a grid map of the environment, for example as discussed in [24J]. The 
graph of Figure [T] can represent the grid environment of Figure 121 where 
every white cell of the grid corresponds to a node in the graph (black cells 
are obstacles). In the following, we complete the description of our setting 
by introducing sensing and action capabilities of the patroller and of the 
intruder. 




Figure 1: The graph representing the patrolling setting used as running example. 

The sensing capabilities of the patroller are defined by a function S : 
V x V — > [0,1] where S(i,j) is the probability with which the patroller, 
given that its current vertex is i, detects an intruder that is in vertex j. In 
Sections 3 and 4, we assume that the patroller can sense only its current 
vertex without any uncertainty. Formally, we assume that S(i,j) = 1 only 
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Figure 2: Grid map of the environment corresponding to the graph of Figure [TJ 

if i = j, and S(i, j) = otherwise. In Section 5 we discuss how our results can 
be extended to the general case in which patroller's perceptions are described 
by a generic S( • , • ). When the patroller detects the intruder, we say also 
that the patroller captures the intruder. The intruder is in the position 
to observe the movements of the patroller along the graph and to derive a 
correct belief over the patroller's strategy. That is, the intruder knows the 
patroller's strategy before acting. In Section 5 we discuss how our results 
can be extended to the case in which the intruder cannot perfectly observe 
the environment before acting. 

Time is discretized in turns and k G N denotes a turn. We assume a simple 
movement model for the patroller: it spends one turn to move between two 
adjacent vertices in G and patrol the arrival vertex. As in 0, we assume 
that the intruder is able to appear directly in a target when it decides to enter 
and to disappear directly from the entered target. We recall that, when an 
intrusion is attempted in target t, the intruder stays there and cannot do 
anything else for d(t) turns. During these turns the intruder can be detected 
(captured) by the patroller. We discuss in Section 5 how the above model can 
be extended to situations in which the intruder reaches targets by moving 
along paths and there is a delay between the turn at which the intruder 
makes the decision to enter a target and the turn at which it actually enters. 

2.2. Game Model 

The game model we employ to capture a patrolling problem belongs to 
the class of two-player extensive-form games with imperfect information. In 
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particular, we use a two-player dynamic repeated game [10(, where the play- 
ers are the patroller agent and the intruder agent. (The game can be rep- 
resented also as a partially-observable stochastic game with infinite states. 
However, since this representation does not provide any advantage, we do 
not discuss it.) The game develops in turns. At each turn, a strategic- 
form game is played in which the players act simultaneously. The patroller 
chooses the next vertex to reach among those adjacent (directly connected) 
to its current vertex, formally, called i the vertex of the patroller at turn k, 
its available actions are move(j), for all vertices j, such that a(i,j) = 1. At 
turn k, the intruder, if it has not previously attempted to enter any target, 
chooses whether or not to enter a target and, in the first case, what target 
to enter, formally, its actions are wait and enter(i). If, instead, the intruder 
has previously attempted to enter a target i, it cannot take any action for 
d(i) turns after having attempted to enter. This repeated game is dynamic 
since it changes at each turn: the positions of the patroller (i.e., its current 
vertex) and of the intruder (i.e., attacking a target or waiting) change. The 
game is with imperfect information since, when the patroller acts, it does not 
know whether the intruder is currently within a vertex or it is still waiting 
to attack. That is, the intruder's actions are not perfectly observable. The 
game has an infinite horizon, since the intruder is allowed to wait indefinitely 
outside the environment. 

Figure [3] reports a portion of the extensive-form representation of the 
patrolling game for the setting of Figure [Tj, given that the initial position of 
the patroller is vertex 01. Branches represent actions and players' information 
sets are depicted as dotted lines. We recall that an information set of a player 



is a set of decision nodes of the player that it cannot distinguish [10|- That 
is, when the player is in any decision node of the information set, it just 
knows to be in a node of that set, but it does not know in which specific 
node. Information sets are used to represent players' imperfect observation 
over the actions of their opponents. In our game tree, we use information sets 
in two ways. First, we use information sets of the intruder: given a decision 
node rj of the patroller, all the decision nodes of the intruder that are direct 
descendent of r\ constitute an information set. This is in accordance with the 
fact that, at each turn, the players act simultaneously, and the the intruder 
cannot observe the last action undertaken by the patroller. Second, we use 
information sets of the patroller to represent the fact that it cannot observe 
the intruder's actions except when it detects the intruder in some vertex. 
The possible outcomes of the game are: 
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Figure 3: A portion of the game tree for the patrolling setting of Figure [TJ with the 
patroller initially in 01. 

• no- attack: when the intruder never enters any target; 

• intruder- capture: when the intruder attempts to enter a target i at 
turn k and the patroller detects the intruder in target i in the time 
interval {k, k + 1, . . . , k + d(i) — 1}; 

• penetration-i: when the intruder enters a target i at turn k and the 
patroller does not detect the intruder in target i in the time interval 
{k, k + 1, . . . , k + d{i) - 1}. 

Agents' utility functions over the outcomes are defined as follows. The pa- 
troller 's utility function, denoted by u p , depends on the values v p of the 
preserved targets. We assume the patroller to be risk neutral, and therefore: 



Notice that the patroller gets the same utility when the intruder is captured 
and when the intruder never enters. This is because, in the case a util- 
ity surplus is given for capture, the patroller could prefer a lottery between 
intruder- capture and penetration-i to no-attack. This behavior is not rea- 
sonable, since the patroller's purpose is to preserve as much value as it can. 
In the case the patroller is risk averse, u v should be defined as a concave 
function of the sum of the preserved values. In the case it is risk seeking, u p 
should be defined as a convex function. 




[i) x = no-attack or intruder-capture 
v p (i) x = penetration- j 
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The intruder's utility function, denoted by u- 1} depends on the value v\ of 
the attacked target. We assume the intruder to be risk neutral, and therefore: 



where e G M + is a penalty due to the capture. This is to say that the status 
quo (i.e., no-attack) is better than being captured (i.e., intruder- capture) for 
the intruder. In the case of risk averse or risk seeking intruder, u\ should be 
concave or convex, respectively. 

Formally, we define the space H of all the possible histories hs of vertices 
visited (or, equivalently, actions taken) by the patroller. For example, for the 
setting of Figure [H given that the patroller starts from vertex 01, a possible 
history is h — (01,02,03,07,08). We define the patroller's strategy as a p : 
H — > A(V) where A(V) is a probability distribution over the vertices V or, 
equivalently, over the corresponding actions move(i)s. More precisely, given 
an history h G H, strategy a p gives the probability with which the patroller 
will move to vertices at the next turn. Such a probability can be strictly 
positive only for vertices that are adjacent to that where the patroller is after 
history h. Notice that the patroller's strategy does not depend on the actions 
undertaken by the intruder. This is because the patroller cannot observe 
them. (Once the patroller has observed the intruder, i.e., when the patroller 
captures the intruder, the game concludes.) When a p is in pure strategies, 
i.e., when o~ p assigns all the probability to a single vertex for each possible 
history h, we say that the patrolling strategy is deterministic. For instance, 
considering Figure [T] a deterministic strategy could prescribe the patroller 
to follow the cycle (04, 05, 06, 11, 18, 17, 16, 10, 04). Otherwise, we say that 
the patrolling strategy is non-deterministic. For instance, consider again 
Figure [1] with the patroller in vertex 01 after a history h, a non-deterministic 
strategy could be: 



We define the intruder's strategy as o\ : H — > A(V U {wait}) where A(V U 
{wait}) is a probability distribution over the vertices V or, equivalently, over 
the corresponding actions enterii) and the action wait. 




—e x = intruder- capture , 
Vi(i) x = penetration-i 



x = no-attack 




01 with probability 0.25 

02 with probability 0.25 . 
06 with probability 0.5 
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2.3. Leader- Follower Solution Concept 

The intruder's ability to observe the patroller's strategy and act on the 
basis of such observation "naturally" induces one to analyze the game from a 
leader-follower stance, where the patroller is the leader and the intruder is the 
follower. The appropriate solution concept for leader-follower games is the 
leader-follower (also said Stackelberg) equilibrium Its peculiarity is that 
the leader commits to a strategy and the intruder acts as a best responder 
given such commitment. Rigorously speaking, the follower is not just a best 
responder: in order to have an equilibrium, if it is indifferent between some 
actions, it should choose the one that maximizes the patroller's expected 



utility. In [16( , the authors show that in any two-player strategic- form game 
the leader never gets worse by committing to a leader-follower equilibrium 
strategy than by playing a Nash equilibrium strategy. Therefore, the leader 
will always commit to a leader-follower equilibrium strategy. However, to 
the best of our knowledge, there is not any similar result for the game we 
are dealing with (two-player extensive-form game with imperfect information 



and infinite horizon). In what follows we extend the result presented in [16 
to our game. 

First, we consider the patroller's strategy in absence of any commitment, 
later we will show that the patroller never gets worse when it commits to 
a leader-follower equilibrium strategy. The appropriate solution concept for 
an extensive-form game with imperfect information is the sequential equi- 
librium jiij], which is a refinement of Nash equilibrium. More precisely, a 
sequential equilibrium is a pair (a, /x) where a is the agents' strategy profile 
and /i is a system of beliefs (it prescribes how agents update their beliefs 
during the game). In a sequential equilibrium, the strategies are guaranteed 
to be rational (sequential rationality) and the beliefs to be consistent with 
the agents' optimal strategies (Kreps and Wilson's consistency). The pres- 
ence of an infinite horizon complicates the study of the game. Considering 
our model, the patroller's strategy o~ p (h) is in principle infinite, h being in 
principle infinitely long. With an infinite horizon, classic game theory studies 
a game by introducing symmetries, e.g., an agent will repeat a given strat- 
egy every k turns. (A classical example is the Rubinstein's alternating-offers 
protocol |26j, where a buyer and a seller can negotiate without any deadline.) 
Introducing symmetries in our model amounts to fix the length / of the his- 
tories in H. That is, in the patroller's strategies, the next action is selected 
on the basis of the last I actions, with I finite and constant during all the 
game. For instance, when I = 0, actions in the patroller's strategy do not 
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depend on any previously taken action. Namely, the probability to visit a 
vertex does not depend on the (adjacent) vertex where the patroller currently 
is. Hence, when I = 0, the patroller performs well only in environments that 
are fully connected. When 1 = 1, the patroller chooses its next action on the 
basis of its last action and then its strategy is Markovian. In this case, the 
selection of the next vertex to visit depends only on the current vertex of the 
patroller. Reasonably, when increasing the value of /, the expected utility of 
the patroller never decreases, because the patroller considers more informa- 
tion to select its next action. Classic game theory shows that infinite horizon 
extensive-form games admit a maximum length, say /, of the symmetries such 
that beyond / the expected utility does not increase anymore 0. In our 



model, this means that when the patroller's strategy is defined on the last / 
vertices with / > / the patroller's expected utility is the same it receives when 
/ = I. On the other hand, we expect that when increasing the value of /, the 
computational complexity for finding a patrolling strategy increases. This is 
because the spaces of patroller's and intruder's strategies become larger. In 
particular, the number of possible pure strategies a p (h)s and <7i(h)s is 0(n l ), 
where n is the number of vertices. In practical settings, the selection of a 
value for / is a trade-off between expected utility and computational effort. 

Given a value for /, our game can be essentially reduced to a strategic-form 
game, because the game repeats every / turns. Therefore, we can consider 
a reduced game that is /-turn long and constrain agents' possible strategies 
to be indefinitely repeated in all /-turn games. In our case, the patroller's 
strategy <r p can be represented by a collection of {ct^i}, where ah,% is the 
probability to execute action move(i ) given history h. The intruder's strategy 
o"i can be conveniently represented by using the following macro- act ions: 
enter-when(i , h) and stay-out. Action enter-when(i, h) corresponds to make 
wait until the patroller has followed history h and then to make enter{i)\ 
stay- out corresponds to make wait forever. 

We now show that when the leader (in our case the patroller) commits to 
a leader-follower equilibrium it obtains an expected utility that is not worse 
than the expected utility it would obtain from a sequential equilibrium. 

Theorem 2.1. Given the game described above with a fixed I, the leader 
never gets worse when committing to a leader- follower equilibrium strategy 
with respect to a sequential equilibrium strategy. 

Proof. The finite horizon game we obtain after having introduced symme- 
tries can be easily translated into a strategic-form game, as prescribed by 
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the classic normal-form of an extensive-form game [10j- If the leader does 
not commit to any strategy, it receives the expected utility prescribed by a 
sequential equilibrium of the game. This equilibrium is a specific Nash equi- 
librium of the strategic- form game. By von Stengel and Zamir [l6[, in any 
two-player strategic-form game, the worst leader-follower equilibrium is not 
worse than the best Nash equilibrium for the leader, and therefore, in our 
case, the worst leader-follower equilibrium is not worse than any sequential 
equilibrium for the leader (patroller). The thesis of the theorem follows. □ 
According to the above theorem, in our patrolling setting the patroller 
(leader) will commit to the leader-follower equilibrium strategy, which is the 
optimal patrolling strategy. 

2.4- Summary of the Main Results 

The main contribution of this paper is an algorithm to compute the opti- 
mal patrolling strategy (i.e., the leader-follower equilibrium strategy) for the 
setting we consider. We present our algorithm in Sections [3] and HI For the 
sake of presentation, we discuss our algorithm under the assumption that 
the setting is such that the optimal patrolling strategy gives the patroller 
a non-null probability of visiting all the targets. Otherwise, at least a tar- 
get would be never visited (when following the optimal patrolling strategy) 
and the intruder will always have success in entering that target. In these 
cases, multiple robots should be employed. We will see in Section how our 
approach can be used to capture some multirobot settings. 

One of the most important features we considered in developing the pro- 
posed algorithm is computational efficiency In order to achieve good per- 
formance in the process of finding a leader-follower equilibrium, and thus 
an optimal patrolling strategy, we divided this process in two steps. In the 
first one, the algorithm searches for a deterministic patrolling strategy such 
that the intruder's best response is to make stay-out. If such a strategy ex- 
ists, then it is a leader-follower equilibrium strategy because it provides the 
patroller with the largest utility. Instead, if such a strategy does not exist, 
then the algorithm proceeds with the second step where it searches for an 
equilibrium characterized by a mixed, or non-deterministic, strategy. The 
main reason behind this two-step process is that searching for a determinis- 
tic equilibrium strategy is computationally less expensive than searching for 
a non-deterministic one. Therefore, solving the first step separately leads to 
a significant improvement in computational performances every time a deter- 
ministic equilibrium strategy can be found. In the following we summarize 
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the details of the two steps. 

In Section [3] the algorithm for computing a deterministic equilibrium pa- 
trolling strategy is presented. A deterministic equilibrium strategy is found 
as the solution of a feasibility problem, which is computed by resorting to con- 
straint programming techniques. When solving this problem, the patrolling 
setting can be significantly simplified by reducing the graph to target vertices 
only. Moreover, another important advantage coming from this formulation 
is that it does not depend on the history length, namely, the value of I = \h\ 
can be arbitrarily large. 

In Section H] the non-deterministic case is discussed. When the problem is 
to find a non-deterministic equilibrium strategy, the patrolling setting must 
be entirely considered, differently from the previous case. Moreover, the 
mathematical formulation of the problem strongly depends on the history 
length I, yielding significant influences on the computational effort needed 
to compute its solution. The algorithm we propose works in three steps. 
First, in order to simplify the resolution process, all the agents' dominated 
actions, i.e., the actions that a rational agent would never play, are dis- 
carded. In the second step, the algorithm searches for an equilibrium where 
the intruder's strategy is stay-out and, if such equilibrium is found, the cor- 
responding non-deterministic strategy of the patroller is taken as the optimal 
patrolling strategy. Conversely, if the second step does not find an admissible 
solution, the algorithm proceeds with the last step where a leader-follower 
equilibrium is computed and the optimal patrolling strategy is found. The 
problem addressed in the second step is formulated as a bilinear feasibility 
mathematical programming problem while that addressed in the third step 
is formulated as a multi-bilinear optimization problem. 

3. Finding Deterministic Equilibrium Patrolling Strategies 

In this section, we formally state the problem of finding a determinis- 
tic equilibrium patrolling strategy (Section 13. ip . we survey the main related 
works (Section 13.21) . we study the computational complexity of this prob- 
lem (Section 13.31) . we provide our solving algorithm (Section 13 .4j) . and we 
experimentally evaluate it (Section l3.5p . 

3.1. Problem Formulation 

A deterministic patrolling equilibrium strategy a p is represented as a se- 
quence of vertices, or equivalently as a sequence of actions move(i), such 
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that, when adopted by the patroller, each target is left uncovered for a num- 
ber of turns not larger than its penetration time and thus the optimal action 
of the intruder is stay-out. Indeed, if the intruder attempts to enter a target, 
it will be captured by the patroller. (We recall that we are interested in 
studying situations wherein the equilibrium patrolling strategy covers all the 
targets.) For algorithmic reasons, we will consider deterministic equilibrium 
patrolling strategies that are cyclic, namely that are composed of a finite se- 
quence of vertices starting and ending with the same vertex, such that, when 
indefinitely repeated by the patroller, make entering any target not rational 
for the intruder. As we shall show below, our representation of deterministic 
patrolling strategies does not fix any upper bound on the length I of history 
h. 

At first, we show that the problem of searching for a deterministic equi- 
librium strategy in a graph G is equivalent to search for a deterministic 
equilibrium strategy in a reduced graph G'. More precisely, we show that if 
G' admits a deterministic equilibrium strategy a' p , then we can always derive 
from cr' a strategy a p that is a deterministic equilibrium strategy for G, and, 
if G' does not admit any deterministic equilibrium strategy, then also G does 
not. Reducing G to G', searching for a deterministic equilibrium strategy 
<7p for G', and deriving cr p from a' p will allow us to save a large amount of 
computational time with respect to searching directly for a p in G. We start 
by discussing how we can reduce G to G 1 . 

The idea at the basis of reduction is that G' is composed only of tar- 
gets and the patroller will move between two targets along a shortest path 
connecting them. Formally, starting from G = (V,A,T,v,d), we define 
G' = {T,A',w,d), where targets T are the vertices of G'; A' is the set of 
arcs connecting the targets defined as a function a' : T x T — > {0, 1} and 
derived from set A as follows: for every pair of targets i,j e T and % ^ j, 
a'(i, j) = 1 if at least one of the shortest paths connecting i to j in G does not 
pass through any other target, a'(i,j) = otherwise; to is a weight function 
defined as w : T x T — > N where w(i,j) is the length of the shortest path 
between i and j in G (w(i,j) is defined only when a'(i,j) = 1); d is defined 
as in G. Notice that agents' values do not appear in G' . This is because the 
computation of a deterministic equilibrium strategy covering all the targets 
does not depend on their evaluations. The reduction algorithm develops in 
two steps. 

1. The shortest paths connecting each pair of targets are computed by 
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repeatedly applying Dijkstra's algorithm 27] to G. (Dijkstra's algo- 
rithm is applicable since no negative arc weight appears in G.) The 
asymptotical worst-case computational complexity is 0(tn 2 ), where t 
is the number of targets in T and n is the number of vertices in V . 
2. All the shortest paths computed in the previous step are analyzed and, 
for every path that does not contain targets (except from the source 
and the destination), the corresponding arc is added to A' (avoiding 
duplications) and its weight w is set equal to the length of the shortest 
path. This is accomplished by applying a linear search to each shortest 
path with a resulting computational complexity of 0(t 2 n). 

Consider the graph reported in Figure HJ The corresponding reduced graph 
G' is reported in Figure Ufa). G' is composed of only 5 vertices. (We report 
another graph in Figure H(b) that differs from that in (a) in the penetration 
times; we shall use it as example in the following sections.) 

We now show that given a strategy a' for G' we can derive a strategy a p 
for G. Specifically, cr p is built such that, for any pair of consecutive vertices 
in a' , there is a sequence of vertices (i, Z\, . . . , z m , j) in cr p that is a 
shortest path connecting % to j. We can now state the following proposition. 



Proposition 3.1. If G' admits a deterministic equilibrium strategy a' p , then 
a p (derived as discussed above) is a deterministic equilibrium strategy for G , 
and, if G' does not admit any deterministic equilibrium strategy, then also G 
does not. 

Proof. The proof of the first implication is trivial. Indeed, given any de- 
terministic equilibrium strategy a' defined on G', the corresponding a p is a 
deterministic equilibrium strategy in G, because the time (number of turns) 
between two successive visits to a target in a' is the same that the time be- 
tween two successive visits to the same target in a p (by definition of weights 
w in G'). To demonstrate the second implication we proceed by contradic- 
tion assuming that G admits a solution and that G' does not. This means 
that the solution in G prescribes the patroller to move between some targets 
along paths different from the shortest ones. However, if no solution that 
uses shortest paths can cover all the targets without leaving any one of them 
uncovered for more than its penetration time, then no other solution can do 
it, because, not using shortest paths, time between two successive visits to 
a target can only increase. Therefore, it follows by contradiction that if G' 
does not admit any solution, then also G does not. □ 



18 



(a) (b) 

Figure 4: (a) Reduced graph G' corresponding to that of Figure Q] (b) The same graph 
as in (a), but with different penetration times. 

We are now in the position to formally define the problem of searching 
for a deterministic equilibrium strategy. For the sake of presentation, from 
here on we use a in the place of a' p and we talk of vertices of G' and cr, 
instead of targets. Formally, we define a function a : {1,2, . . . , s} — > T, 
where a(j) is the j-th element of the sequence. The length of the sequence is 
s. The temporal length of a sequence of visits is computed by summing the 
weights of covered arcs, i.e., by summing the times (in number of turns) for 
covering arcs, J2j=i w a U + !))• The time interval between two visits 
of a vertex is calculated similarly, summing the weights of the arcs covered 
between the two visits. 

A solution of our problem (a deterministic equilibrium patrolling strat- 
egy) is a sequence a such that the following properties are satisfied: 

1. c is cyclical, i.e., the first vertex coincides with the last one, namely, 
a(l) = a(s); 

2. every vertex in T is visited at least once, i.e., there are no uncovered 
vertices; 

3. when indefinitely repeating the cycle, for any i e T, the time interval 
between two successive visits of i is never larger than d(i). 

It is clear that, when the patroller follows repeatedly such a sequence a as 
its patrolling strategy, no intrusion can occur. 

Let denote by Oj(j) the position in cr of the j-th occurrence of % and by Oj 
the total number of z's occurrences in a. For instance, consider Figure H](a): 
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given o = (14,08, 18,08, 14), O 08 (l) = 2 and O 08 (2) = 4, while o 08 = 2 and 
o 6 = 0. Notice that, given a sequence a, quantities Oi(j) and Oj can be 
easily calculated. 

Given such definitions we can formally re-state the problem in a mathe- 
matical programming fashion. We aim at finding a sequence a of s visits to 
vertices of G' (targets) such that the following constraints are satisfied: 

a(l)=a(s) (1) 

> 1 Vi e T (2) 

a'(a(j-l),a(j)) = l Vj 6 {2, 3, . . . , s} (3) 

Oi(fe+l)-l 

^ w (<t(j), + 1)) < d{i) \/ieT,\/k€{l,2,...,Oi-l} (4) 

j=Oi(k) 

o,(i)-i s-l 
]T w(<t(j),«t(j + 1))+ " K?VCj + *)) < <*« VieT (5) 

3=1 3=Oi(o 4 ) 



Constraint ([T]) states that a is a cycle, i.e., the first and last vertices of a 
coincide; constraints (|2J) state that every vertex is visited at least once in 
a; constraints ([3]) state that for every pair of consecutively visited vertices, 
say a(j — 1) and cr(j), it is a'(a(j — l),a(j)) = 1, i.e., vertex a(j) can be 
directly reached from vertex a(j — 1) in G'; constraints (jl]) state that, for 
every vertex i, the temporal interval between two successive visits of % in a 
is not larger than d(i); similarly, constraints ([5]) state that for every vertex i 
the temporal interval between the last and first visits of % is not larger than 
d(i), i.e., the deadline of i must be respected also along the cycle closure. 
Hence, our goal is to find a sequence of vertices cr(l), cr(2), . . . , a(s) such 
that the above constraints are satisfied. Notice that also the length s of the 
sequence must be found as part of the solution. In Section 13.4} we present a 
method for finding a sequence a that satisfies constraints ([I])-©. 

If we consider, for instance, the problem described by the graph of Fig- 
ure SJa), it is easy to show that no sequence of visits can satisfy all the 
constraints described above. To see this, it is enough to observe that the 
shortest cycle covering only vertices 06 and 08, i.e., (06,08,06), has a tem- 
poral length larger than the penetration times of both the involved vertices, 
so there is no way to cover these vertices (and others) within their penetra- 
tion times. As we shall show below, the problem described by the graph of 
Figure H](b) admits a deterministic equilibrium strategy. 
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3.2. Related Works 

To the best of our knowledge, the strategic patrolling literature does 
not present significant works for finding a deterministic equilibrium strategy. 
In particular, it does not take into account the penetration times of the 



targets, see, e.g., [28|, |29|. A larger amount of works related to our problem 
can be found in the operational research literature, since the problem is 
ultimately that of finding a particular sequence of vertices in the graph G'. 
The main of these works are extensions and refinements of the Traveling 
Salesman Problem (TSP). The first extension of the TSP we consider relates 



to settings with temporal constraints and is called deadline-TSP 30]. In 
this problem vertices have deadlines over their first visit and some time is 
spent traversing arcs. Rewards are collected when a vertex is visited before 
its deadline, while penalties are assigned when a vertex is either visited after 
its deadline or is not visited at all. The objective is to find a tour that 
maximizes the reward, visiting as many vertices as possible. Vertices can be 
visited more than once in the tour, but the reward/penalty is received only 
at the first visit. A more general variant is the Vehicle Routing Problem with 
Time Windows 31] where deadlines are replaced with time windows, during 
which visits of vertices must occur. This problem has been studied also by 
employing constraint programming techniques 321 . Cyclical sequences of 
visits are addressed in the period routing problem 33j where vehicle routes 
are constructed to run for a finite period of time in which every vertex has to 
be visited according to a given frequency. Frequencies can be given also as 
lower bounds, considering the real frequencies of visits as decision variables 
of the problem [34|. In the cyclic inventory routing problem 35| vertices 
represent customers with a given demand rate and storage capacity. The 
objective is to find a tour such that a distributor can repeatedly restock 
customers under some constraints over visiting frequencies. 

Two are the main issues that distinguish our problem from those described 
above. The first one is that our problem is defined according to relative dead- 
lines (calculated with respect to two consecutive visits to the same vertex) 
and the absolute deadlines (calculated with respect to the beginning of the 
sequence) depend on the solution itself, as expressed by constraints flU)-©. 
The extension of the above works, mostly considering absolute deadlines, to 
our problem introduces highly non-linear constraints and does not seem to 
be straightforward. The second issue is that in our problem we aim at find- 
ing only a solution, and not the optimal solution according to some metric. 
Thus, we are solving a feasibility problem and not an optimization problem. 
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3.3. NF '-Completeness 

In this section we discuss the computational complexity of finding a de- 
terministic equilibrium strategy for our patrolling setting, denoted as the 
DET-STRAT problem. 

Theorem 3.2. The DET-STRAT problem is NP-complete. 

Proof. We prove the NP-com plet eness by reducing the Directed Hamilto- 
nian Circuit problem (DHC) Q to the DET-STRAT problem. DHC is 
the problem of determining if an Hamiltonian path, i.e., a path that vis- 
its each vertex exactly once, exists in a given directed graph. This is a 
well-known NP-complete problem. (For the sake of simplicity, we consider 
a graph G = (V, A,T,v, d). A similar, but more complicated, proof can be 
produced considering a graph G' = (T,A,w,d).) Let us consider a generic 
instance of the DHC problem given by a directed graph Gh = (Vh, A h ) where 
Vh is the set of vertices and Ah is the set of arcs. In order to prove that 
DHC can be reduced to the DET-STRAT problem, we show that for every 
instance Gh of the DHC problem an instance G s of the DET-STRAT prob- 
lem can be built in polynomial time and that by solving the DET-STRAT 
problem on G s we obtain also a solution for the DHC problem on Gh- An 
instance G s = (V s , A s ,T s ,v s , d s ) can be easily constructed from Gh in the fol- 
lowing way: V s = T s = Vh, A s = Ah, for every v G V s we impose d{v) = \Vh\ 
and the functions v s can be arbitrarily defined. It is straightforward to see 
that a solution of G s , if it exists, is an Hamiltonian cycle. Indeed, since the 
relative deadline of every target is equal to the number of targets, a deter- 
ministic equilibrium strategy should visit each target exactly once, otherwise 
at least one relative deadline would be violated (recall that covering one arc 
in G s is assumed to require one time unit). Therefore, computing the so- 
lution for G s provides by construction a solution for Gh or, in other words, 
the DHC problem can be reduced to the DET-STRAT problem, proving its 
NP-completeness (it is trivially polynomial to verify that a given sequence of 
vertices is a solution of the DET-STRAT problem). □ 

3.4- Solving Algorithm 

In this section, we present our basic solving algorithm (Section I3.4.ip . we 
report an example (Section I3.4.2[) . we theoretically analyze some properties 
of the algorithm (Section 13.4.31) . and we show how to improve its efficiency 
(Section E23D- 
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3. 4-1- Basic Algorithm 

We formulate the problem of finding a deterministic equilibrium patrolling 
strategy described in the Section 13.11 as a Constraint Satisfaction Problem 
(CSP) [37j]. Each a(j) is considered as a variable with domain Fj C T. 
The constraints over the values of the variables are ([I])-©- A solution is an 
assignment of values to all the variables such that all those constraints are 
satisfied. When the problem is put in this form, it resembles some problems 
of CSP-based scheduling. When compared with the large literature in AI that 
studies scheduling problems, cyclic scheduling presents comparatively limited 
results (e.g., [38j]). Differently from the problems studied in the literature, in 
our problem the number s of variables a(j)s which compose the solution is 
not known in advance and must be computed as part of the solution. This is 
because a vertex can appear more times in a. As a consequence, we cannot 
resort to constrain programming tools, e.g., ILOG CP [39|, and we need to 
develop an ad hoc algorithm. 

The algorithm we propose for finding a solution basically searches the 
state space with backtracking. Forward checking 37] is used in the attempt 
to reduce the branching of the search tree. We report our algorithm in 
Algorithms H El andO 

Algorithm [1] simply assigns cr(l) a vertex i G T. Notice that if a solution 
exists, it can be found independently of the first vertex appearing in a. Since 
the solution a is a cycle that visits all vertices, every vertex can be chosen 
as the initial one. Hence, the choice of % in Algorithm [1] does not affect the 
possibility of finding a solution. 



Algorithm 1: FIND_SOLUTION(T, A', w, d) 

1 select a vertex i in T 

2 assign u(l) «— i 

3 call recursive_call(T, A', w, d, a, 2) 



Algorithm [2] assigns a(j) a vertex from domain Fj C T, which contains 
available values for a(j) that are returned by the forward checking algorithm 
(Algorithm [3]). If Fj is empty or no vertex in Fj can be successfully assigned 
to o-(j), then Algorithm [2] returns failure and a backtracking is performed. 

Algorithm [3] restricts Fj to the vertices directly reachable from the last as- 
signed vertex a(j — 1) such that their visits do not violate constraints (jl])-©. 
Notice that checking constraints (jl])-© requires knowing the weights (tem- 
poral costs) related to the arcs between the vertices that could be assigned 
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Algorithm 2: RECURSIVE_CALL(T, A', w, d, a, j) 



1 if <t(1) = o~(j — 1) and constraints JJJ) hold then 

2 if constraints (5[) ftoW then 

3 |^ return a 



4 else 

5 |^ return FAILURE 



6 else 



7 

8 
9 



assign Fj <— forward_ohecking(T, A', 



w, d, a, j) 



10 
11 
12 



for all the i in Fj do 
assign <x(j) <— i 

assign a 1 «— recursive_call(T, A', 
if cr' is not FAILURE then 
|^ return a' 



w, d, a, j + 1) 
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return FAILURE 



subsequently, i.e., between the variables cr{k) with k > j. For example, con- 
sider the graph of Figure H](b) and suppose that the partial solution currently 
constructed by the algorithm is a = (14). In this situation, we cannot check 
the validity of constraints Q-Q since we have no information about times to 
cover the arcs between the vertices that will complete the solution. In order 
to cope with this, we estimate the unknown temporal costs by employing an 
admissible heuristic (i.e., a non-strict underestimate) based on the minimum 
cost between two vertices. The heuristic being admissible, no feasible solu- 
tion is discarded. We denote the heuristic value by W, e.g., w(i, cr(l)) denotes 
the weight of the shortest path between i and cr(l). We assume w(i,i) = 
for any i. 

Given a partial solution a from 1 to j — 1, the forward checking algorithm 
considers all the vertices directly reachable from a(j — l) and keeps those that 
do not violate the relaxed constraints ©-(ISJ) computed with heuristic values. 
It considers a vertex % directly reachable from a(j — 1) and assumes that 
a(j) = i. Step 5 of Algorithm [3] checks relaxed constraints (J5J) with respect 
to i, assuming that the weight along the cycle closure from a(j) = i to <r(l) 
is minimum. In the above example, with cr(l) = 14, the vertices directly 
reachable from a(l) are 08 and 18. The algorithm considers a (2) = 08. By 
Step 5, we have w(a(l), 08) + W(08, <r(l)) = 4 < d(08) = 18 and then Step 5 
is satisfied. It can be easily observed that such condition holds also at the 
next iteration of the cycle, when a(2) = 18. Step 8 of Algorithm [3] checks 
relaxed constraints flSJ) with respect to all the vertices k ^ i, assuming that 
both the weight to reach k from a(j) = i and the weight along the cycle 
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closure from k to cr(l) are minimum. Consider again the above example. 
It can be easily observed that when cr(2) = 08 such conditions hold for 
all k. Instead, at the next iteration, when cr(2) = 18 and k = 06 we have 
w(o-(l),18) + «J(18,06) + W(06,o-(1)) = 16 > d(06) = 14. The relaxed 
constraint is violated and vertex 18 will be not inserted in Fj. Similarly, 
Step 6 checks relaxed constraints (j3j) with respect to i and Step 9 checks 
relaxed constraints (J3J) with respect to any k assuming that the weight to 
reach k from a(j) = i is minimum. In the above example, starting from 
a = (14), the relaxed constraints are satisfied only when i = 08 and therefore 
Fj = {08}. Finally, we notice that Steps 5 and 8 are checked only when o» = 
and Ok = 0, respectively, since it can be easily proved that when Oj > and 
Ok > these conditions always hold. 



Algorithm 3: FORWARD_CHECKING(T, A',w,d,a,j) 

1 assign Fj <— 

2 assign s < — jr' — 1 

3 for all members i in T such that a'(cr(s),i) = 1 do 
if conditions 

f 0i = A J2 s Ci w (°"(0. f(J + !)) + w(0-(s), i) + Su(i, cr(l)) < d(t) or 

Oj > A Z)f=o-(oO t0 ( IT W) + !)) + M, ( cr ( s ))0 < and . 
/or aZZ fc ^ i, 

(o fc = A Ef^i w(a(0, o-(/ + 1)) + w(o-(s), i) + «J(i, k) + w(k, cr(l)) < d(k) 

o k > A E^o fc(ofc) w(<t(0, <t(I + 1)) + w(a(s), i) + W(i, k) < d(k)) 
hold then 
[_ add i to Fj 



3.4-2. Example 

We apply our algorithm to the example of Figure IU(b). We use a random 
selection in Step 1 of Algorithm [1] (to choose the first visited vertex of the 
sequence) and in Step 7 of Algorithm [2] (to choose the elements of Fj as part 
of the current candidate solution). We report part of the execution trace 
(Figure [5] depicts the complete search tree): 

(a) the algorithm assigns cr(l) = 14; 

(b) the domain F 2 (depicted in the figure between curly brackets beside 
vertex cr(l) = 14) is produced as follows (recall the discussion of the 
previous section): 
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• vertex 08 is added to F 2 , since all the conditions in Algorithm [3] 
with i = OS are satisfied; 

• vertex 18 is not added to F 2 , since the condition in Step 8 of 
Algorithm [3] with k = 06 is not satisfied, formally, w(14, 18) + 
W(18, 06) + W(06, 14) > d(06); 

• no other vertex is added to F 2 , since no other vertex is directly 
reachable from 14; 

(c) the algorithm assigns cr(2) = 08; 

(d) the domain F 3 is produced similarly as above, yielding to F 3 = {06}; 

(e) the algorithm assigns cr(3) = 06 and continues. 

Some issues are worth noting. In the 9 th node of the search tree, a sequence 
a with cr(l) = a(s) and including all the vertices was found. However, this 
sequence does not satisfy constraints ([5]). If the search is not stopped and 
backtracked at the the 9 th node (in Step 5 of Algorithm [2]) , the algorithm 
would never terminate. Indeed, the subtrees that would follow this vertex 
would be the infinite repetition of part of the tree already built; in particular, 
of the solution in bold of Figure Finally, in the 5 th node, no possible 
successor is allowed by the forward checking, and therefore the algorithm 
backtracks. 

3.4-3. Some Properties of the Algorithm 

In this section, we present some properties of the proposed algorithm. At 
first, we prove its soundness and completeness. 



Theorem 3.3. The algorithm of Section\3-4- 1\ is sound and complete. 



Proof. We initially prove the soundness of the algorithm. We need to prove 
that all the solutions it produces satisfy constraints ([I])-©. Constraints ([I]), 
([2]), and ([5]) are satisfied by Algorithm [2j If at least one of them does not 
hold, no solution is produced. The satisfaction of constraints Q is assured 
by Algorithm [3] in Step 3, while the satisfaction of constraints (T4j) is assured 
by Algorithm [3] in Steps 6 and 9. 

In order to prove completeness we need to show that the algorithm pro- 
duces a solution whenever at least one exists. In the algorithm there are only 
two points in which a candidate solution is discarded. The first one is the for- 
ward checking in Algorithm [3j Indeed, it iteratively applies constraints (T4|)- 
§5§ to a partial sequence a exploiting a heuristic over the future weights (i.e., 
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Figure 5: Search tree for the example of Figure @Jb); bold nodes and arrows denote the 
obtained solution; FjS are reported besides nodes <r(j — 1); x th denotes the order in which 
the tree's nodes are analyzed. 

the time spent to visit the successive vertices) . Since the employed heuristic 
is admissible, no feasible candidate solution can be discarded. The second 
point is the stopping criterion in Algorithm [2j when all the vertices occur in 
a (at least once) and the first and the last vertex in a are equal, no further 
successor is considered and the search is stopped. If a satisfies all the con- 
straints, then a is a solution, otherwise backtracking is performed. We show 
that, if a solution can be found without stopping the search at this point, 
then a solution can be found also by stopping the search and backtracking 
(the vice versa does not hold). This issue is of paramount importance since 
it assures that the algorithm terminates (recall the example of the previous 
section in which, without this stopping criterion, the search could not termi- 
nate). Consider a a such that er(l) = a(s) and including all the vertices in T. 
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The search subtree following a(s) and produced by the proposed algorithm 
is (non-strictly) contained in the search tree following from er(l). This is 
because the constraints considered by the forward checking from o~(s) on are 
(non-strictly) harder than the corresponding ones from <r(l) to o~(s). The 
increased hardness is due to the activation of constraints (TjJ that are needed 
given that at least one occurrence of each vertex is in a. Thus, if a solution 
can be found by searching from c(s), then a shorter solution can be found 
by stopping the search at o~(s) and backtracking. This concludes the proof 
of completeness. □ 
Now we derive an upper bound over the temporal length of a. 

Theorem 3.4. If the problem defined in Section \3J\ admits a solution, then 
there exists at least a solution a with temporal length no longer than max t ^T{d(t)} . 

Proof. In order to prove the theorem it is sufficient to prove that, if a problem 
is solvable, then there exists a solution a in which there is at least a vertex 
that only appears once, excluding er(s). Indeed, if this statement holds then 
the maximum temporal length of a is bounded by d(i) where i is the vertex 
that appears only one time in a. It easily follows that, in the worst case, the 
maximum temporal length of o is max te ^{d(t)}. Figure El shows a situation 
in which the temporal length of the unique solution is exactly max te r{<i(t)}. 



Figure 6: A situation in which the temporally shortest solution is as long as the upper 
bound of Theorem 13.41 

We now prove that, if the problem is solvable, then there is a solution 
in which at least a vertex appears only once. To prove this, we consider a 
solution a wherein cr(l) is the vertex with the minimum relative deadline, 
i.e, cr(l) = argmin teT {rf(t)}. (Notice that, according to the discussion of 
Algorithm (TJ this assignment does not preclude finding a solution.) We call 
k the minimum integer such that all the vertices appear in the subsequence 
cx(l) —a(k). We show that, if the problem is solvable, then it is not necessary 
that vertex v = o~{k) appears again after k. A visit to v after k would be 
observed if either it is necessary to pass through v to reach a(l) or it is 
necessary to re- visit v, due to its relative deadline, before cr(l). However, 
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since all the vertices but v = o~(k) are visited before k, all the vertices but 
v can be visited without necessarily visiting v. Furthermore, the deadline 
of cr(l) is by hypothesis harder than cr(/c)'s one and then the occurrence of 
v = <r(k) after k is not necessary. Therefore, vertex cr(k) occurs only one 
time. □ 
The above theorem provides an upper bound on the temporal length 
of a. Notice that there are cases for which this bound is exact; namely, there 
are instances of the problem for which the temporally shortest solution has 
length exactly max 4g r{d(t)} (as in the graph of Figure [6]). In other cases, the 
bound is not exact (as in Figure [6] with d(01) = 5). In any case, the upper 
bound can be exploited to limit the depth of the search tree preserving the 
algorithm's completeness. Indeed, if J2t- 

ma,x t£ T{d(t)}, then the search can be safely stopped and backtracked. 

Finally, we focus on linear settings, namely on situations in which a de- 
terministic equilibrium strategy has to be found for a patroller acting in 
corridors (see, e.g., Figure [6]) and rings. These settings appear commonly in 
real-world applications and their study can lead to the definition of simple 
heuristics that are very effective also for non-linear settings. 

Proposition 3.5. If a problem defined on a linear graph admits a solution, 
then the linear sequence of the vertices is a solution. 

Proof sketch. Consider a setting like that in Figure El with any functions w 
and d. Suppose o~(l) = 01 and consequently a (2) = 02. If there is no feasible 
a with a (3) = 03, then the problem is not feasible. Indeed, if a (3) = 03 does 
not satisfy the constraints over the deadline of 01 along the cycle closure, 
then there is not any k such that o~(k) = 03 satisfies such constraints. The 
same argument can be applied to any other vertex and to any linear graph. 
□ 

The above proposition suggests a simple method to check the feasibility 
of a problem defined on a linear graph. If the linear sequence is not feasible, 
then the problem is unfeasible, otherwise it constitutes a solution. The length 
of such solution grows linearly in the number of vertices. Notice that the 
problem can admit more solutions, and the length of some of them can be 
larger than that of the linear solution. For example, in Figure [61 with d(03) 
arbitrarily large, a solution could be a = (01, 02, 01, 02, 03, 02, 01). 

Let us now consider our algorithm in linear settings. At first, we note that 
the problem is feasible if and only the application of the forward checking 
to the two extremes of the graph returns non-null domains. This is because, 
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the conditions considered in the forward checking correspond exactly to the 
feasibility of the linear sequence and therefore, if the problem is unfeasible, 
the returned domain is empty. Furthermore, in order to guide efficiently the 
search it is sufficient that at each node of the search tree the successors are 
ordered from the minimum to the maximum Oj. In this case, the algorithm 
produces a search tree whose size is linear in the number of vertices. 

3.4-4- Improving Efficiency and Heuristics 

In this section, we show how to reduce the number of constraints to be 
checked in the forward checking, we introduce a more sophisticated stopping 
criterion, and we propose some heuristics to select vertices. 

Consider the conditions in Steps 5 and 8 of Algorithm [31 Except for 
the first execution of Algorithm [3] (i.e., when j = 2), the satisfaction of the 
condition at Step 5 for a given j is granted if the condition in Step 8 for 
j — 1 is satisfied. Therefore, we can safely limit the algorithm to check the 
conditions at Step 5 exclusively when j = 2. The same considerations hold 
also for the conditions in Steps 6 and 9. Therefore, we can safely limit the 
algorithm to check the conditions at Step 6 exclusively when j = 2. 

We also introduce a more sophisticated stopping criterion called LSC 
(Length Stopping Criterion) based on Theorem 13 .41 such that if Y^i=i w ( a (l)i °"(^+ 
1)) + w(a(s), cr(l)) > m&x t€T {d(t)}, then the search is stopped and back- 
tracked. We introduce also an a priori check (IFC, initial Forward Check- 
ing): before starting the search, we consider each vertex as the root node of 
the search tree and we apply the forward checking. If at least one domain is 
empty, the algorithm returns failure. Otherwise, the tree search is started. 

Finally, we introduce some heuristic criteria for choosing the next vertex 
to expand in Step 8 of Algorithm[2j lexicographic {hi), random with uniform 
probability distribution (h r ), maximum and minimum number of incident 
arcs {h max a and h min a ), less visited {h min v ), and maximum and minimum 
penetration time (h max d and h m i n d)- For all the ordering criteria except h r , 
we introduce a criterion for breaking ties (RTB, Random Tie-Break) that 
selects a vertex with uniform probability. We used the previous heuristics 
also for selecting the initial node of the search tree in Step 1 of Algorithm [TJ 

3.5. Experimental Results 

In this section, we experimentally evaluate the performance of our al- 
gorithm in producing a deterministic equilibrium strategy or in returning 
a failure. We developed a random generator of graphs G' with parameters 
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n (number of vertices, corresponding to targets) and m (number of arcs) 
and working as follows. Given two values n and m, firstly a random con- 
nected graph with n vertices is produced, then m — n arcs are added. All 
the arcs' weights are set equal to 1 (it can be easily shown that this is 
the worst case for computational complexity). Values d(k) are uniformly 
drawn from the interval [mhijj {w(i, j) + w(j, i)}, 2n 2 maxjj w(i, j)], where 
w(i,j) is the length of the shortest path between vertices i and j. The 
lower bound of the interval comes from the consideration that settings with 
d(k) < minjj {w(i, j) + w(j, i)} are infeasible and our algorithm immediately 
detects that unfeasibility (by IFC). The upper bound is justified by consid- 
ering that if a problem is feasible then it always admits a solution shorter 
than 2n 2 maxij{w(i, j)}. Graphs differ from each other in the topology and 
in the penetration times of the vertices. This program and those implement- 
ing our algorithms have been coded in C and executed on a Linux (2.6.24 
kernel) computer equipped with a DUAL QUAD CORE Intel XEON 2.33 
GHz CPU, 8 GB RAM, and 4 MB cache. 

For each ordering criterion (i.e., h h h r , h max a , h min a , h min v , h max d , 
hmin d) with and without LSC and IFC and for each n G {3, 4, 5, 6, 7, 8, 100, 250, 500} 
we produce 1000 patrolling settings with m uniformly drawn from the inter- 
val [n, (n — l)n] (if m < n the graph is not connected, if m > (n — l)n at 
least a pair of vertices is connected by more than one arc). We evaluate the 
percentage of terminations of the algorithm within 10 minutes and, in the 
case of termination (either with a solution or with a failure), the computa- 
tional time. Since, as discussed in the Section I3TTI our approach is aimed at 
finding a solution and not the optimal solution according to a given metric 
(e.g., the cycle length), we do not measure the quality of a solution. 

The most significant experimental results we obtained are reported in Ta- 
ble [TJ the numbers reported in the table are averaged over the 1000 settings 
that have been generated as discussed above. The table shows the termi- 
nation percentage and, for terminated runs, the average time, its standard 
deviation, and the maximum and minimum observed times. The first remark 
is that, for all the algorithm configurations, the averaged computational time 
is reasonably short also for large settings. Instead, the termination percent- 
age differs very much in different configurations. In this sense, the behavior 
of the proposed algorithm resembles that of many constraint programming 
algorithms, whose termination time is usually either very short (when a so- 
lution is found) or the algorithms do not terminate within the deadline. The 
random generation of graphs explains the data relative to the maximum com- 
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n 


3 


4 


5 


6 


7 


8 


100 


250 


500 


RTB 
LSC 
IFC 


% 


100 


100 


100 


99.8 


99.6 


99.5 


98.9 


96.6 


90.2 


time Isl 


< 0.01 


< 0.01 


< 0.01 


0.32 


0.10 


0.05 


0.16 


0.87 


5.50 


dev [s] 


< 0.01 


< 0.01 


< 0.01 


5.17 


1.78 


0.96 


3.52 


14.47 


30.28 


max [s] 


< 0.01 


< 0.01 


< 0.01 


98.00 


35.00 


19.00 


78.26 


316.9 


413.94 


min [sj 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


0.01 


0.07 


h r 

LSC 
IFC 


% 


100 


100 


100 


98.5 


97.5 


96.5 


95.1 


55.1 


9.8 


time Isl 


< 0.01 


< 0.01 


0.11 


0.09 


0.16 


0.02 


1.34 


2.52 


4.66 


dev [s] 


< 0.01 


< 0.01 


1.64 


1.70 


1.73 


0.18 


6.19 


16.75 


51.62 


max [s] 


< 0.01 


< 0.01 


32.00 


33.00 


24.00 


2.00 


93.36 


513.66 


590.87 


min [sj 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


0.01 


0.07 


h r 

IFC 


% 


100 


100 


99.0 


97.2 


96.7 


95.5 


94.0 


53.0 


8.9 


time Is 


< 0.01 


0.44 


3.65 


0.14 


0.26 


0.01 


7.12 


3.41 


5.94 


dev [s] 


< 0.01 


8.68 


38.89 


2.24 


2.36 


0.16 


39.32 


18.02 


55.14 


max [sj 


< 0.01 


173.55 


594.10 


43.03 


31.86 


2.09 


561.95 


501.72 


582.77 


min [sj 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


0.01 


0.07 


fomin v 

RTB 
LSC 


% 


100 


100 


100 


96.7 


96.0 


95.5 


95.0 


93.3 


86.2 


time Isl 


< 0.01 


< 0.01 


0.34 


2.98 


0.16 


0.01 


0.30 


1.00 


6.19 


dev [s] 


< 0.01 


< 0.01 


6.29 


33.77 


2.24 


0.11 


6.50 


15.32 


35.77 


max [s] 


< 0.01 


< 0.01 


125.03 


519.75 


42.22 


2.41 


145.22 


366.42 


498.04 


min [sj 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


0.01 


0.07 


h r 

LSC 


% 


100 


100 


100 


95.4 


93.9 


92.5 


91.2 


52.4 


7.7 


time [sj 


< 0.01 


< 0.01 


0.79 


3.04 


0.30 


0.03 


7.16 


3.48 


5.83 


dev [sj 


< 0.01 


< 0.01 


13.58 


24.32 


2.77 


0.21 


39.53 


18.46 


55.65 


max [sj 


< 0.01 


< 0.01 


270.03 


303.72 


41.53 


2.83 


566.04 


531.64 


596.42 


min [sj 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


0.02 


0.01 


0.07 


h r 


% 


100 


100 


98.7 


94.2 


93.0 


91.8 


90.3 


51.0 


7.1 


time [sj 


< 0.01 


7.45 


2.45 


4.78 


1.38 


0.14 


1.37 


3.74 


6.18 


dev [s] 


< 0.01 


55.45 


28.61 


42.13 


9.96 


1.03 


6.28 


18.45 


56.80 


max [s] 


< 0.01 


556.92 


506.72 


496.84 


140.31 


12.86 


93.26 


516.72 


576.52 


min [sj 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


0.02 


0.01 


0.07 


hi 
LSC 
IFC 


% 


100 


99.2 


91.0 


81.1 


75.3 


69.0 


3.9 


2.3 


1.5 


time [sj 


< 0.01 


7.45 


2.45 


4.78 


1.38 


0.14 


0.10 


0.01 


0.07 


dev [sj 


< 0.01 


55.45 


28.61 


42.13 


9.96 


1.03 


< 0.01 


< 0.01 


< 0.01 


max [s] 


< 0.01 


548.41 


505.74 


497.46 


140.11 


12.01 


< 0.01 


0.01 


0.07 


min [sj 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


0.01 


0.07 


hi 


% 


100 


99.2 


88.0 


78.0 


71.7 


65.0 


0.0 


0.0 


0.0 


time [s 




< 0.01 


7.42 


2.61 


5.12 


1.61 


0.20 








dev [s 




< 0.01 


55.23 


28.66 


42.65 


10.57 


1.29 








max [s 




< 0.01 


548.41 


505.74 


497.46 


140.11 


12.01 








min [s 




< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 


< 0.01 









Table 1: Experimental results for different algorithm configurations. 
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putational time: some cases are harder than the average and require a lot of 
time to be solved (in practice, they both reduce the percentage of termination 
and increase the computational time). These hard cases, which represent out- 
liers of the population of graphs, are characterized by complicated topologies 
or oddly-distributed relative deadlines. 

Here are some comments regarding other issues. 

Ordering criteria The best ordering criterion is h m i n v with RTB. We omit 
the experimental results with h max a , h min a , h max d , and h min d , since 
they are very similar to those obtained with h\. The criterion h min v 
with RTB leads the algorithm to terminate with a percentage close to 
h r for small values of n and about 80% larger for large values of n. 
Instead, hi provides very bad performance, especially for large values 
of n, when the algorithm terminates with percentages close to 0%. 

LSC The improved stopping criterion allows the algorithm to increase the 
termination percentage by a value between 0% and 2%, without dis- 
tinguishable effects on the computational time. This improvement de- 
pends on the configuration of the algorithm since it affects the con- 
struction of the search tree. 

IFC This criterion allows the algorithm to increase the termination percent- 
age by a value between 1% and 4%, reducing the computational time 
(since all the non-feasible linear settings are solved with a negligible 
computational time). This improvement does not depend on the con- 
figuration of the algorithm since it does not affect the search, working 
before it. 

Therefore the best algorithm configuration is h min v with RTB, LSC, and IFC. 
With this configuration, the results are good: the termination percentage 
is very high also for large settings, like those with 500 vertices, and the 
corresponding average computational time, about 5.5 s, is reasonably short. 
We notice that in some practical indoor settings the number of vertices is in 



the range {24, . . . , 100}, see, e.g., 24 . 



4. Finding Non-Deterministic Equilibrium Patrolling Strategies 

In this section, we consider the problem of finding non-deterministic equi- 
librium patrolling strategies for the game formulated in Section I2T21 We recall 
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that, according to the proposed approach, given a patrolling setting, we first 
look for a deterministic equilibrium strategy that, if found, makes attacking 
a target not rational for the intruder. If a deterministic equilibrium strat- 
egy cannot be found, we look for a non-deterministic equilibrium strategy, 
following the algorithm presented in this section to find leader-follower equi- 
libria. In what follows, we survey the main related works on the computation 
of leader-follower equilibria (Section 14. ip . we present our solving algorithm 
(Section 14.21) . and we experimentally evaluate it (Section l4.3p . 

4-1. Related Works 

The literature presents some works on the computation of leader-follower 
equilibria that are based on operational research techniques. Basically, the 
computation of a leader-follower equilibrium can be formulated as a multi- 



level optimization problem [40j], where each level corresponds to a specific 
player. In our case, the first optimization level is associated to the patroller 
and the second one to the intruder. When both the optimization problems (at 
the first and second level) are linear, the algorithmic game theory literature 



provides assessed techniques for their solution j4jj. When, instead, at least 
an optimization problem (at the first or second level) is not linear, as it 
turns out to be in our case, works in literature provide more a collection of 
examples than a coherent set of results. We briefly review these works. 

The main result on which other algorithms for finding leader-follower 



equilibria are based is by von Stengel and Zamir [16]: at the equilibrium, 
the follower always employs pure strategies, playing the best response for the 
strategy the leader committed to. (We recall that, at the equilibrium, the 
follower, when more equivalent best responses are available, must choose the 
one that maximizes the leader's expected utility.) 

Two are the main works addressing the situation in which both the levels 
of the optimization problem are linear. The first one is by Conitzer and Sand- 



holm [41[. The authors formulate the problem of finding a leader-follower 
equilibrium as a multi-linear programming problem (Multi-LP). The basic 
idea is, for each action a of the follower, to compute the (mixed) strategy of 
the leader 07 that maximizes the leader's expected utility under the constraint 
that a is a follower's best response. Among all such strategies cr/s, the leader 
will choose the one that maximizes its expected utility. With this method, 
there are as many optimization problems as pure strategies of the follower 
and each single optimization problem is linear. Conitzer and Sandholm show 
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in [41] that the problem of computing mixed-strategy leader-follower equi- 
libria in two-player complete-information games is polynomial in the size of 
the game. The same result holds also when the leader's type is uncertain. 
Instead, when the follower's type is uncertain the problem is NP-hard. The 
second work is by Paruchuri et al. [8], in which the authors propose an alter- 
native mathematical programming formulation based on mixed integer linear 
programming (MILP) that is shown to be more efficient with multiple fol- 



lower's types with respect to the Multi-LP formulation of 41]. The basic 
idea behind this alternative approach is to represent the follower's actions as 
binary variables where value 1 means that the corresponding action is played 
and value means that it is not. 

As said, when the optimization problems involved in the determination 
of leader-follower equilibria are non-linear, no assessed technique is available. 
Non-linear optimization problems have been studied mainly when the non- 
linearity lays in the second level and the optimization problem is convex and 



regular. In these cases, the Karush-Khun- Tucker [42| theorem is employed to 
linearize the optimization problem. In our case, we cannot apply this method 
because, as discussed later, our problem is not convex. Alternative methods 



are based on non-exact linearization and produce approximate solutions [42 

Computing equilibria in games in practical settings usually requires a 
large computational effort. A simple approach to save computational time 
is the reduction of the search space. Customarily, in game theory, this is 
accomplished by exploiting the concept of dominance: action a of player i is 
said to be (weakly) dominated by another action a 1 of player i if player z's 
expected utility from playing action a is never larger than expected utility 
from playing action a' (independently of the actions of the other players). 
Dominated actions can be safely removed, since they will be never played 
by rational agents. Usually, removing dominated actions is computationally 
inexpensive and drastically reduces the search space, improving the compu- 



tational efficiency of the solving algorithm [43 



4-2. Solving Algorithm 

In this section, we overview the proposed solving algorithm (Section l4.2.ip . 
we describe in detail its steps (Section 14.2.21 and Section l4.2.3p . we provide a 
specific formulation for strictly competitive settings (Section I4.2.4p . and we 
discuss some theoretical properties of the algorithm (Section |4.2. 5p . 
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4-2.1. Algorithm Overview 

Our algorithm works on the original graph G as defined in Section 2.1. We 
shall show in Section 14.2.51 that a reduction to a graph G' composed only of 
target vertices (as done in Section 3) cannot be applied when searching for a 
non-deterministic equilibrium strategy. The algorithm depends on the value 
of I, namely on the finite length of actions' history. In this paper, we provide 
the algorithm for the case 1 = 1. This is because, formulations with / = are 
applicable only to environments with fully connected topologies (as discussed 
in Section 12.31 and further detailed in Section 14.2. 5p and formulations with 
/ > 1, as we shall discuss below, can be obtained by extending the case with 
1 = 1. Our algorithm develops in three steps as follows. 

1. The first step removes the dominated actions (of the patroller and of 
the intruder) that the agents will never play. 

2. The aim of this step is to check whether or not there exists a patroller's 
strategy such that intruder's action stay-out is a best response. If there 
exists such a strategy, it is the optimal patrolling strategy. Otherwise, 
the algorithm passes to the third step. 

3. The aim of this step is to compute the equilibrium strategy for the 
patroller under the assumption that the intruder will not make stay-out. 

4-2.2. Removing Dominated Actions 

The first step of our algorithm is the removal of agents' dominated actions. 
We remark that, to the best of our knowledge, ours is the first attempt to 
apply dominated action removal to patrolling problems. This step splits 
into two phases: at first we remove the patroller's dominated actions and, 
subsequently, we remove the intruder's dominated actions. 

We focus on the patroller's side. We recall that the actions of the patroller 
are of the form move(i). This means that, if action move(i) is dominated, 
the patroller will never visit vertex i and then such vertex can be eliminated 
from the patrolling problem. Therefore, by discarding patroller's dominated 
actions, we obtain a reduction of the graph G. Basically, we can remove any 
vertex i (and all its ingoing and outgoing arcs) such that the correspond- 
ing action move(i) is dominated by another action move(j) (with j ^ i), 
independently of the intruder's strategy, in the following way. 

1. For each target t, we remove from G any vertex i such that the shortest 
distance between i and t is strictly larger than the penetration time 
d(t). A rational patroller will never visit such vertices. Indeed, if it 
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visits them, then the intruder will attack target t being sure not to be 
captured in the next d(t) turns. If, after this removal, the graph is not 
connected, then there is not any non-deterministic equilibrium strategy 
that can cover all the targets. In this discussed in Section 2.4, 

multiple robots should be adopted. 
2. After having reduced the graph G as described above, we compute the 
shortest paths for any pair of targets t, t' and we remove all the vertices 
that do not belong to any shortest path. The idea is that if a vertex 
z is not on any shortest path and a strategy a p prescribes that the 
patroller can make action move(z) with strictly positive probability, 
then it can be easily observed that, if the patroller does not make such 
action, it cannot decrease its expected utility. Indeed, the probability 
to reach any target t within d(t) turns cannot decrease since visiting z 
would introduce an unnecessary temporal cost. In the case there are 
multiple shortest paths between two targets, we keep all of them (in 
some specific cases, we could select a particular shortest path as we 
showed in (2p|). 

We call G r = (V r , A r , T, v, d) the reduced graph produced as prescribed by 
the two above steps. Graph G T can be obtained in linear time in the size n 
(number of vertices) of the patrolling setting, given the shortest paths. We 
recall that these shortest paths have been already computed (by applying 
Dijkstra's algorithm) when our algorithm searched for the deterministic equi- 
librium strategy. We report in Figure [7]the graph G r for our running example 
of Figure [1] after having removed the vertices corresponding to the dominated 
actions move(i) of the patroller. The dominated actions are: move(i) with 
i E {04, 05, 09, 10, 15, 16, 17, 20, 25, 26, 27, 28, 29}. 

We now focus on the intruder's actions. We recall that the actions of 
the intruder are of the form enter-when(t , h) , where t is a target and h is 
a history, and stay-out. Considering / = 1, actions enter- when(t, h) reduce 
to enter-when(t,i), with the meaning that the intruder enters target t after 
it observed the patroller in vertex i. We consider only dominance between 
actions enter-when(t,i). This means that, if action enter-when(t, i) is domi- 
nated, the intruder will never attack t after having observed the patroller in 
i and then such action can be discarded. We remove intruder's dominated 
actions in the following way. 

Fixed a target t, enter-when(t, i) is dominated by another intruder's ac- 
tion enter-when^, i') if the patroller, when covering every path starting from 
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Figure 7: Graph G r for the patrolling setting of Figure [T] obtained by removing the 
vertices corresponding to the patroller's dominated actions. 

i' and arriving at t before d(t) turns, must always visit vertex i. The basic 
idea is: the intruder's expected utility for action enter-when(t, i) depends 
on the probability that such action will lead to a successful intrusion. This 
probability is related to the probability that the patroller reaches t within 
d(t) turns from its current vertex i. It can be easily observed that, in the 
case the patroller, starting from i', must always visit % to reach t within d(t) 
turns, the probability that the patroller reaches t starting from i within d(t) 
turns is not smaller than the probability that the patroller reaches t from i' 
within d(t) turns (it is a trivial application of Markov chains). Therefore, 
the expected utility (for the intruder) of enter-when(t, i) is not larger than 
the expected utility of enter-when(t,i'). This holds independently of the 
patroller's strategy. It can be easily observed, instead, that, given two dif- 
ferent targets t\ and t 2) actions enter-when(ti,i) and enter- when(t2,i') may 
not have any dominance relationship for any possible i and i' independently 
of the patroller's strategy. 

Consider the example reported in Figure [3 Consider target 08 and ver- 
tices 01 and 02. The probability that the patroller reaches target 08 within 
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d(08) = 8 turns starting from 01 is not larger than the probability that the 
patroller reaches target 08 within d(08) = 8 turns starting from 02. In- 
deed, when the patroller starts from vertex 01, it must always pass through 
vertex 02 to reach 08 by 8 turns. Therefore, the intruder will prefer to 
play enter- when(08, 01) rather than enter-when(08,02) independently of the 
patroller's strategy. This shows why, in this example, enter-when(08, 01) 
dominates enter-when(08, 02) and this last action can be removed. 

We call Vt C V r the subset of vertices that satisfy the following condition: 
for every i G V t action enter-when{t , i) is not dominated. Therefore, V t 
provides a compact representation of the non-dominated actions. Set V t can 
be found by resorting to tree search techniques in the following way. We 
denote by q a node in the search tree and by n(q) the vertex corresponding 
to q. We build a tree of paths where, called qo the root, r](q ) = t and a 
node q" is a successor of q' if and only if q" G {q s.t. a(rj(q') , r)(q)) = 1} 
and q" ^ q' and q" ^ father(q'). The maximum depth of the tree is d(t). 
That is, we consider all the paths not longer than d(t) and with t as first 
vertex. Figure M reports the tree of paths generated for t = 06. Given a tree 
of paths so built, an action enter-when(t, i) is dominated when there exists 
a vertex i' such that for each q with 77(g) = i there is a q' with rj(q') = i' 
and father (q') = q. In this case, i ^ Vt. In Figure [HJ black nodes denote 
vertices i such that actions enter-when(06, i) are dominated; for example, 
action enter-when(06, 13) is dominated since every occurrence of vertex 13 
in the search tree has a node with vertex 14 as child. The computational 
complexity of removing intruder's dominated actions is 0{\T\b maxt&T ^ d<yt ^) 
where T is the set of targets, b is the largest number of outbound arcs from 
a target, and max tg <r{(i(t)} is the largest penetration time. 

To summarize, the results produced by the first step of our algorithm are: 

• a reduced patrolling setting G r , obtained from G by removing some ver- 
tices and their corresponding arcs that represent patroller's dominated 
actions, 

• for each t G T a subset of vertices Vt, containing the vertices such that 
every action enter-when(t , i) with i G V t is not dominated. 

4-2.3. Mathematical Programming Formulation 

In this section, we illustrate the second and the third step of our algo- 
rithm. We present them together since they are conceptually similar, being 



39 




Figure 8: Search tree for finding dominated actions for target 06 of Figure [7] 



both based on mathematical programming. The solution of the mathemat- 
ical programming problem is the non-deterministic equilibrium patrolling 
strategy we are looking for. More precisely, we provide two mathematical 
programming formulations, one for each step of the algorithm, whose solu- 
tion can be obtained using optimization software tools, e.g., [44]. As before, 
we provide our mathematical formulations when \h\ = I = 1. With 1 = 1, the 
Markov hypothesis holds and the patroller's strategy {a^)^ move ^)) introduced 
in Section [23] can be compactly represented by {ctij} G V r , where each 
a,ij denotes the probability for the patroller to move from vertex i to vertex 
j (namely, to take action move(j) while in i). Our formulation is inspired 
by and introduces some non-linearities 42]. More precisely, given a pure 
strategy of the intruder a\ = a, the maximization of the patroller's expected 



40 



utility is linear in the objective and bilinear in the constraints. The non- 
linearity is due to the symmetries of our game model that are introduced 
by the Markov hypothesis: we need to constrain behavioral strategie £0 to 
be equal in the same state, i.e., a^j is fixed for all the decision nodes for 
which the patroller's current vertex is i. This non-linearity forces us to avoid 
a mixed integer formulation (as in [8]) that, in our case, would be a mixed 
integer non-linear problem whose efficient solution is still an open issue in 
the operational research field. In order to have the minimal non-linear degree 
(i.e., quadratic degree) we took inspiration form the sequence-form proposed 
in 53. 

We now present the details of the second step of our algorithm where 
we check whether there exists at least one patroller's strategy a p such that 
stay-out is a best response for the intruder. If such a strategy exists, then the 
patroller will follow it, being its utility maximum when the intruder abstains 
from the intrusion (recall the utility definition of Section [2^21) . This step is 
formulated as a bilinear feasibility problem in which a^s are the decision 
variables. We denote by V r \ % the set obtained by removing element % from 
set V r and by 7^' the probability that the patroller reaches vertex j in w 
steps, starting from vertex i and not sensing (i.e., not passing through) target 
t. The feasibility problem is the following: 



Oi,3 > Vi, j 6 Vr (6) 

J2 = 1 Vi 6 V r (7) 

jew 

CH,j < a, r (i,j) Vi, j £ V r (8) 

= otij vt e t, i, j eVrj^t (9) 

</= E (7"« 1 ' i «*,j) Vu>£{2,--.,d(t)},\/teT,i,jeV r> j^t (10) 

x£V r \t 



u\(intruder- capture) j 1 — y ' T* < 

V iev -\t / vteT,zev t (ii) 

+Ui(penetration-t) S ' 7^ ' < 

ieV r \t 

Constraints ©-(IZD express that probabilities ajj-s are well defined; con- 
straints ([H]) express that the patroller can only move between two adjacent 



1 We recall that a behavioral strategy of an agent in a given decision node is the strategy 
conditioned by the agent being at such node. 
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vertices; constraints (19j)- (TTUj) express the Markov hypothesis over the pa- 
troller's decision policy; constraints fill I) express that no action enter-when(t, z) 
gives to the intruder an expected utility larger than that of stay-out. The 
non-linearity is due to constraints ffTUl) . If the above problem admits a so- 
lution, the resulting a^s are the optimal patrolling strategy We notice 
that, due to constraints ffTUl) . the above feasibility problem is not convex and 
we cannot linearize it by applying Karush-Khun- Tucker theorem (see Sec- 
tion UH[) . When no dominated action can be removed, the problem presents 
0(mn 2 max t( zT{d(t)}) variables and constraints (where n is the number of 
vertices in G and m is the number of targets). In practical settings, re- 
moving dominated actions drastically reduces the number of variables and 
constraints, as we shall show in Section 14.31 

We now discuss the relationships between this second step of our algo- 
rithm to compute non-deterministic equilibrium strategies and the algorithm 
presented in Section 3 to compute deterministic equilibrium strategies. The 
similarities between these two algorithms are due to the fact that they both 
produce patrolling strategies (if they exist) such that the intruder's best 
response is stay-out. Anyway, their scope is different and they are both 
necessary. Indeed, if there exists a deterministic equilibrium strategy with 
\h\ = I > 1, the above mathematical programming problem fails in finding it, 
since it is not Markovian, and such strategy can be found only applying the 
algorithm of Section 3. On the other side, when no deterministic equilibrium 
strategy exists, there could be a non-deterministic equilibrium strategy such 
that the intruder's best response is stay-out. Obviously, an extension of the 
above formulation with I very large would make the algorithm for the com- 
putation of deterministic equilibrium strategies unnecessary. However, such 
a formulation is expected to be hardly solvable in practice. 

When the above feasibility problem does not admit any solution (i.e., 
there is not any patroller's strategy such that stay-out is a best response for 
the intruder), we pass to the third step of the algorithm. In this step, we find 
the best response of the intruder such that the patroller can maximize its 
expected utility. This problem is formulated as a multi-bilinear programming 
problem. The single bilinear problem, in which enter-when(s, q) with s G T 
and q G V s is assumed to be the intruder's best response, is defined as: 

max Up(penetration-s) y K 7q^ S '' S + u p (intruder-capture) j 1— 7q^'' S ) 

iev,,\s \ iev r \s J 
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s.t. 



constraints ItH-l llOI I 



^(intruder- capture) I Ta i — To i 

\iev r \i iev r \« / vteT, z e V< (12) 

-Ui(penetration-s) 'y^^' 3 — Ui(penetration-t) -yj? Y > 

iev r \s ievv\t 



The objective function maximizes the patroller's expected utility. Con- 
straints (1121) express that no action enter-when(t, z) gives a larger value to 
the intruder than action enter-when(s,q). When no dominated action can 
be discarded, we can formulate m ■ n above problems for all the possible ac- 
tions enter-when(s , q) with s £ T,q EV r . Practically, the number of bilinear 
problems to be considered is usually much smaller thanks to elimination of 
dominated actions. The size, in terms of variables and constraints, of each 
optimization problem is the same of the feasibility problem of the second 
step of our algorithm. If a bilinear problem is feasible, its solution is a 
set of probabilities {ctij}, that define a possible patrolling strategy. If the 
problem is unfeasible, then there is not any patroller's strategy such that 
enter-when(s , q) is a best response for the intruder. From all the solutions 
of feasible problems, we pick out the one that gives the patroller the maxi- 
mum expected utility. As a final step, we need to check whether or not all 
the targets can be covered by the strategy. This can be easily accomplished 
by inspecting the randomized strategy. If all the targets can be covered 
with a strictly positive probability, the produced strategy is the optimal 
non-deterministic strategy for our patrolling setting that corresponds to the 
leader-follower equilibrium strategy. In the other case, there does not exist 
(with / = 1) any non-deterministic equilibrium strategy that can cover all 
the targets. 

We report in Figure [9] the transition probabilities corresponding to the 
leader-follower equilibrium for the setting of Figure [7J The intruder's best 
response is enter-when(08, 12). 

It is worth a comment on how the size of the problem changes when 1 = 2. 
In this case, variables as are defined as a^ z with h = representing the 
probability that the patroller moves to vertex z after history h and variables 
7 are defined as 7^0^, with h° = (i°,j ) and h = representing the prob- 

ability that the patroller reaches vertex j from vertex i in w steps, starting 
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Figure 9: Optimal patrolling strategy for Figure [71 

after history h° and not passing through target t. The constraints with I = 2 
are similar to those with I = 1. The problem presents 0(mn A max t £T{d(t)}) 
variables and constraints. 



4-2.4- Improving Efficiency in Strictly Competitive Settings 

Strictly competitive games are two-player games with specific constraints 
over players' preferences lp| . Call % and — % the two players. In strictly 
competitive games we have that for each possible pair of game outcomes 
x, y if Ui(x) > Ui(y), then U-i(x) < u-i(y), and vice versa. Therefore, in a 
strictly competitive game, if Ui(x) = Ui(y) for two outcomes x and y, then it 
necessarily must be that U-i(x) = U-i(y). Zero sum games are a specific class 
of strictly competitive games. If a game is strictly competitive, its resolution 
is easier than when it is not. This is because, with strictly competitive 
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games, von Neuman's theorem assures that the maxminjl minmaxjl and 

10] 



equilibrium (both Nash and leader-follower) strategies are the same [lOj and 
it turns out that computing maxmin and minmax strategies is much easier 
than computing an equilibrium strategy (Nash or refinements). As we shall 
show in this section, in a strictly competitive setting, the second and the third 
step of our algorithm collapse to a unique bilinear optimization problem of 
the size of the feasibility problem fl6T)- ([TTl) of Section 14.2.31 

Our patrolling game can be studied as a strictly competitive game when 
for all targets i,j G T we have that, if v p (i) > v p (j), then Vi(i) > Vi(j), 
and vice versa. Essentially we are requiring that both the patroller and the 
intruder have the same preference ordering over the targets. This assumption 
appears reasonable in a large number of practical settings, especially when 
values of the targets are common values. Rigorously speaking, when the 
above constraints hold, the game is not a strictly competitive game. This 
is because two outcomes (i.e., intruder- capture and no-attack) provide the 
patroller with the same utility and the intruder with two (in general) different 
utilities (i.e., — e and 0, respectively). Anyway, we can temporarily discard 
the outcome no-attack, assuming that action stay-out will not be played by 
the intruder. We shall reconsider such action in the following. Without the 
outcome no-attack and with the above constraints over the agents' valuations 
of the targets, the game is strictly competitive. We provide a mathematical 
programming formulation to find the patroller's minmax strategy, namely, 
the strategy that minimizes the intruder's expected utility and, the game 
being strictly competitive, maximizes the patroller's utility. We call u the 
minmax value, i.e., the expected utility of the intruder. The mathematical 
programming formulation is the following: 

min u 

s.t. 

constraints l6ll-l llOH 

Ui(intruder- capture) j 1 — y K 7^' ' ] + ui(penetration-t) 7^ $ * < u Vt £ T, z G Vi (13) 

\ iev r \t J iev r \t 



2 The maxmin strategy of player i is the strategy that maximizes i's expected utility 
given that all the opponents happen to play the strategies which cause the greatest harm 
to i. 

3 The minmax strategy of player i is the strategy that minimizes the — i's expected 
utility. 
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Constraints ({TBI assure that all the intruder's actions provide it with at 
most u expected utility. It can be easily observed that the size, in terms 
of variables and constraints, of this mathematical programming problem is 
0(mn 2 min t £T{d(t)}), which is the same of the feasibility problem of the 
second step of our algorithm for general, non-strictly competitive, games 
(see previous section). 

Now, we reconsider action stay- out and its corresponding outcome no- 
attack. The basic idea is that the intruder will play stay-out if it pays better 
than any other action. Furthermore, the intruder knows that the utility 
of making stay-out independently of the patroller's strategy is 0. A simple 
approach is to solve the above mathematical programming problem and to 
compare u with respect to 0: if u < 0, then the intruder will play stay-out, 
otherwise it will not. Therefore, in a strictly competitive setting, the second 
and the third step of our algorithm collapse to the resolution of a unique 
optimization problem plus a comparison between u and 0. 

4-2.5. Theoretical Properties 

In this section we report some theoretical properties of our algorithm. At 
first we focus on /. We show that when the topology is fully connected the pa- 
troller receives the largest expected utility with any / > 7 = 0. Subsequently, 
we show that, in arbitrary settings, the value I such that the patroller's 
expected utility is maximum is larger than 1 and therefore the Markovian 
equilibrium strategy we calculate is not the best one for the patroller. Finally, 
we show that, when searching for a non-deterministic equilibrium strategy, 
we cannot reduce the graph G to a graph G' where all the non-target vertices 
are removed as we have done in Section 3. 

We start by discussing I. We show that when the environment has a fully 
connected topology then 1 = 0. We state the following theorem. 

Theorem 4.1. For a fully connected topology, 1 = 0. 

Proof sketch. We prove that, in an environment with fully connected topol- 
ogy, our algorithm produces the same leader-follower equilibrium when I = 
and / = 1. We consider the basic case with three vertices denoted by {1, 2, 3}, 
d(i) = 2 for any vertex i where agents are risk neutral. The proof in the 
general case with more complex patrolling settings and with I > 1 is a gen- 
eralization and we omit it. Suppose 1 = 1. Consider the bilinear program- 
ming problem of Section 14.2.31 in which the best response of the intruder 
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is supposed to be enter-when(2, 1). The objective function can be writ- 
ten as J2i^2 v p(i) " (1 ~~ P) + J2i v p(i) -P where p = a^ 2 + a h ia 1)2 + ai^a^a 
is the probability to capture the intruder in vertex 2. Since ^2 i v p (i) > 
Si^2 w p(^)' ^ ne maximization of the objective function can be reduced to 
the maximization of p. We denote by EUi(j) agent z's expected utility 
from making action j. The constraints are (a) EUi(enter-when(2,l)) > 
EU\(enter-when{2,2)) and EUi(enter-when(2,l)) > EUi(enter-when(2,3)), 
and (b) EUi(enter-when(2,l)) > EUi(enter-when(i, j)) for i e {1,3} and 
for j G {1,2,3}. Consider the first constraint of (a). It can be writ- 
ten as (with the same reduction used for the objective function): ai )2 + 
«i,i a i,2 + «i,3«3,2 < «2,2 + «2,i a i,2 + a 2,3«3,2- The second constraint of (a) 
can be written analogously. Since the objective of the patroller is to max- 
imize ai 2 + tti,i«i,2 + Q!i,3Q!3,2) we have that the maximum is when either 
«i,2 = «2,2 = «3,2 = or = for all i,j,k. The first option is not 
possible, since it prescribes that the patroller never patrols vertex 2 knowing 
that the best response of the intruder is to enter vertex 2. Thus, the second 
option holds. Under this option, the mathematical problem reduces to the 
one with / = and therefore they admit the same solution. In the situation in 
which constraints (b) are more strict than constraints (a), the corresponding 
problem with I = results unfeasible and there is an action of the intruder 
such that the utility expected by the patroller is larger than that expected 
when enter- when(2, 1) (against the assumption that this is the best response 
for the intruder). □ 
With other topologies, it is generally 7 > 1. To see it, let us consider, the 
setting of Figure [1] with the penetration times reported in Figure H^b). Such 
a patrolling problem admits a deterministic equilibrium strategy as showed 
in Section 3 and therefore the patroller can preserve all the value, forcing the 
intruder to make stay-out. Now, suppose to apply the mathematical program- 
ming formulation of Section 4.2.3 to such problem. Being that formulation 
based on the assumption that 1 = 1, the probability that the intruder will 
be captured when it makes, e.g., enter-when(06,23) is strictly lower than 1. 
Indeed, the values an ( j with % 6 {06, 12, 18} are strictly positive to assure 
that the patroller can cover all the targets. Then, by Markov chains, it fol- 
lows that the probability that the patroller reaches vertex 06 starting from 
vertex 23 within 9 turns is strictly lower than 1. We can always find a strictly 
positive value of the penalty to be captured e such that, when the patroller 
follows the Markovian strategy, the intruder strictly prefers to attack a target 
rather than not to attack. Since the intruder will attack and the probability 
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of being captured is strictly lower than 1, the utility expected by the pa- 
troller from following the Markovian strategy (with 1 = 1) will be strictly 
lower than that expected utility from following the deterministic equilibrium 
strategy (for which I is indefinitely large). As this example shows, in general 
I> l/ 

Now, we show that, in looking for a non-deterministic equilibrium pa- 
trolling strategy, we cannot study a patrolling setting considering only the 
targets and neglecting other vertices that connect them. We state the fol- 
lowing proposition. 

Proposition 4.2. A non- deterministic patrolling strategy a p that is defined 
only on the space of the targets and prescribes that the patroller moves between 
targets along the shortest paths may be not optimal for the patroller. 

To show that this proposition holds, consider again our running example 
depicted in Figure [TJ Suppose that a p is defined only on the targets, i.e., a it j 
is defined only when i,j G T, and that the patroller moves between targets 
along the shortest paths. Since the intruder can attack a target both when 
the patroller is in a target and when it is moving along the paths, it can be 
easily observed that the patroller looses expected utility with respect to the 
situation where cr p is defined on all the vertices. Indeed, the intruder can 
wait observing the actions of the patroller and deciding the turn in which to 
enter. Suppose that the intruder attacks vertex 18 after the patroller moved 
in vertex 21 from vertex 18. Since the patroller's strategy is defined only on 
the targets, it means that the patroller is going either 08 or 14. In both these 
two cases, the patroller will spend 6 turns. Then the patroller could came 
back to 18, but it needs more than 2 turns. Thus, if the intruder attacks 18 
after the patroller moved in 21 from 18, then the intruder will surely have 
success in its attack. This does not happen when o~ p is defined on all the 
vertices V r . 

4-3. Experimental Evaluation 

In this section we discuss the performance achieved by our algorithm in 
computing a non-deterministic equilibrium strategy. Differently from the 
experimental evaluation of the algorithm for computing a deterministic pa- 
trolling strategy presented in Section I3.5[ the patrolling settings we tested 
here are not randomly generated, but have been carefully handcrafted to 
highlight some characteristics of our approach. Since the algorithm to find 
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the non-deterministic strategy is much more computationally expensive than 
that employed in the deterministic case, the resolution of a large number of 
randomly generated instances would be computationally intractable, intro- 
ducing difficulties in performing a good experimental evaluation. Further- 
more, our main objective in this phase is not only to assess the absolute 
performance achieved by our algorithm, but also to evaluate the relative 
improvements that the reduction based on dominance of actions and the 
possibility to exploit a strictly competitive formulation could introduce. For 
these reasons, we conducted experiments on a number of patrolling settings, 
built with the aim of reproducing common situations that are likely to be 
found in real world applications. We assume agents to be risk neutral. For 
each setting we computed the non-deterministic equilibrium strategy with 
and without the reduction of dominated actions. Moreover, we also evalu- 
ated the performance when the setting allows one to compute the strategy 
using a strictly competitive formulation. In order to perform this last set of 
experiments, we modified functions v p and V\ in the patrolling settings for 
which the requirements described in Section 14.2.41 are not satisfied. All the 
algorithms have been coded in C and the optimization problems have been 
formulated in the AMPL [46J syntax and solved with the SNOPT [44| solver. 
Tests were conducted on a Linux (2.6.24 kernel) computer with a DUAL 
QUAD CORE Intel XEON 2.33 GHz CPU, 8 GB RAM, and 4 MB cache 

In order to have a fair evaluation, the considered patrolling settings are 
such that no equilibrium patrolling strategy inducing stay- out as the in- 
truder's best response could be found. In other words, to find the equi- 
librium patrolling strategy, all the three steps composing the algorithm for 
non-deterministic patrolling strategies need to be executed. We do not report 
the computational time spent for the removal of dominated actions, since it 
is negligible, being in all the settings shorter than one second. In Table [U we 
report the experimental results related to our running example of Figure [TJ 
We consider four different configurations: 'complete' refers to the basic case 
in which no dominated action is removed, 'reduced p' refers to the case in 
which only patroller's dominated actions are removed, 'reduced p, i' refers to 
the case in which all the dominated actions are removed, and 'sc' refers to 
the strictly competitive setting. We report the following data: 'total time' is 
the computational time, 'opt. prob.' is the number of optimization problems 
to be solved, 'average time' is the average computational times to solve a 
single optimization problem, 'max time' and 'min time' are the largest and 
the shortest, respectively, computational time to solve a single optimization 
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problem, 'std dev.' is the standard deviation. We report in Figures [TU | [TT ] 
[P2| [T31 and dH the graph representations of other patrolling settings and the 
corresponding experimental results. In these cases, 'reduced p' is omitted, 
since no patroller's action is dominated. 





complete 


reduced p 


reduced p, i 


sc 


total time 


> 48 h 


31 h 21 min 25 s 


29 min 13 s 


48 s 


opt. prob. 


812 


240 


30 


1 


average time 




7 min 54 s 


58 s 




max time 




10 h 4 min 52 s 


1 min 49 s 




min time 




46 s 


8 s 




std dev. 




51 min 55 s 


3 min 16 s 





Table 2: Results for the running example of Figure [TJ 
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Figure 10: Results for a corridor-like setting with 10 vertices. 



As it can be seen for all the settings, when no reduction is applied, the 
computational time needed to find the solution is very large. Indeed, a con- 
siderable number of optimization problems have to be solved, most of which 
are not necessary, being associated to intruder's dominated actions. More- 
over, each optimization problem is also characterized by many constraints 
and variables. Results show a remarkable improvement in performance when 
the reduction of the patrolling setting based on dominated actions is applied 
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Figure 11: Results for a ring-like setting with 15 vertices. 



before computing the solution. An average 96% reduction of the computa- 
tional time was observed over all the settings for the non-strictly competitive 
case. This significant decreasing in the total computational time is directly 
related to the smaller number of optimization problems to be solved. How- 
ever, a lower average time for the single problem has been obtained too. If we 
consider that the time needed to perform the reduction of dominated actions 
is negligible, such a pre-processing phase becomes of paramount importance 
for making our approach applicable, especially in complex patrolling settings. 

Finally, in all the settings the best performance is achieved when using the 
strictly competitive formulation. Such a significant improvement is obviously 
due to the fact that only a single (and reduced) optimization problem needs 
to be solved. Despite this kind of formulation can be exploited only in specific 
cases, it constitutes a promising direction for the resolution of very complex 
patrolling settings. 
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Figure 12: Results for a tree-like setting with 16 vertices. 
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Figure 13: Results for a eight-like setting with 13 vertices. 
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46 h 52 min 


1 h 53 min 


4 min 45 s 


opt. prob. 


380 


28 


1 


average time 


6 min 59 s 


4 min 1 s 




max time 


2 h 54 min 


9 min 51 s 




min time 


2 min 38 s 


1 min 42 s 




std dev. 


8 min 29 s 


1 min 43 s 





Figure 14: Results for a eight-like setting with 20 vertices. 



5. Extending the Framework 

In this section, we discuss how our framework can be extended to capture 
some more realistic aspects and to improve its efficiency in finding a patrolling 
strategy. First, we enrich the expressiveness of the framework (Section 15.11) 
and, then, we propose some improvements for the algorithms (Section 15. 2\i . 

5.1. Enriching Framework Expressiveness 

We discuss how the framework expressiveness can be improved captur- 
ing the following situations. We introduce uncertainty over the intruder's 
valuations (Section 15.1.11) and over the intruder's penetration times (Sec- 
tion [5X2]), and we enrich the patroller sensing capabilities (Section 15. 1 .3D . 
Then we extend the intruder's movement model (Section l5.1.4p . we introduce 
delay on intruder's entering (Section 15.1.51) . we introduce intruder's partial 
observability over patroller's actions (Section l5.1.6l) . and we consider the sit- 
uation in which there are multiple patrolling robots (Section l5.1.7l) . For each 
extension, we discuss its impact on the model and on the computation of 
deterministic and non-deterministic equilibrium strategies. 
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5.1.1. Uncertainty over Intruder 's Valuations 

We consider the situation wherein the patroller does not know with cer- 
tainty the intruder's valuations V{S over the targets. 

Model According to Harsanyi's transformation 1(1 . a game with uncertain 



information is cast to a game with imperfect information. More pre- 
cisely, each player can be of different types, each one with different 
values of the payoffs, and the players do not perfectly observe the ac- 
tual type of the opponents, e.g., see (§]. In our patrolling game, the 
intruder could be of different types 9j G 6i, each one with specific val- 
uations V{(i,9j) over target i and penalty €j for being captured. Each 
intruder's type 9j is assigned a probability u>j. The intruder strategy 
o"i must define a strategy for each type of intruder. 

Deterministic Equilibrium Computation No extension is required. In- 
deed, the algorithm for the computation of the deterministic equilib- 
rium strategy described in Section 3 does not depend on the intruder's 
valuations. 

Non-Deterministic Equilibrium Computation The step of the algorithm 
that computes the dominated actions does not require any extension, 
since it does not depend on the intruder's valuations. Considering the 
non-strictly competitive case, the bilinear mathematical programming 
formulation must be extended as follows. We need to compute the best 
patroller's strategy for each profile of actions of intruder's types, e.g., 
with two intruder's types for each (enter- when{i\, hi), enter- when{%2, h 2 )) 
where i,-, hj corresponds to the j-th type. This requires the introduc- 
tion of constraints for each intruder's type in the mathematical pro- 
gramming problem. As a results, both the number of constraints and 
the number of mathematical programming problems to be solved in- 
crease with a power of the number of types |Gi|. This makes the exact 
resolution of real-world settings impractical and pushes for finding ap- 
proximated algorithms. Let us consider the strictly competitive case. 
By definition, the intruder's valuations can be any under the constraint 
that they are strictly competitive with respect to the patroller's ones. 
In the case all the possible intruder's types have valuations that are 
strictly competitive with respect to the patroller's ones, we can ignore 
the intruder's types and we study the game as a classical strictly com- 
petitive game where the intruder's valuations are those of a possible 
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type. This makes the exact resolution of significant complex settings 
affordable. 



5.1.2. Uncertainty over Intruder's Penetration Times 

We consider the situation wherein penetration times of the intruder are 
uncertain due to the fact that the environment is not perfectly predictable 
or modeled. 

Model Each target t's penetration time d(t) is described by a discrete prob- 
ability distribution, e.g., 

{8 with probability 0.3 
9 with probability 0.4 . 
10 with probability 0.3 

Deterministic Equilibrium Computation For each target, we consider 
the shortest possible penetration time according to the given probabil- 
ity distribution (8, in the example above). Beyond that, the algorithm 
works as in the basic case, but considering such penetration time. 

Non-Deterministic Equilibrium Computation For each target, we con- 
sider the largest possible penetration time according to the given prob- 
ability distribution (10, in the example above). The computation of 
the dominated actions works as in the basic case, but using such pen- 
etration time. The mathematical programming formulations, for both 
the non-strictly competitive and the strictly competitive cases, must be 
extended as follows. The value of w in constraints (TTU]) is in the range 
{2, . . . , max(<i(t))}, where max(d(t)) is the largest possible penetration 
time for target t according to the given probability distribution. The 
computation of the intruder's expected utility in constraints pip must 
we weighted with the probabilities of the possible values of d(t). The 
number of constraints and the number of optimization problems to be 
solved is the same as in the basic case. 



5.1.3. Augmented Patroller's Sensing Capabilities 

We consider the situation wherein patroller's sensing capabilities are aug- 
mented. A more detailed description of this situation can be found in 22 
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Model The model has been already described in Section 12.11 and uses a 
generic function S( • , • ) to model the sensing capabilities of the pa- 
troller. 

Deterministic Equilibrium Computation This kind of strategy can be 
computed only if we consider sensing with probability of one. Formally, 
we require that S(i,j) = or S(i,j) = 1 for all vertices i and j. The 
generation of the reduced graph G' must be modified to consider S. 
In practice, G' is composed also of additional vertices that are on the 
boundary of the sensing range with respect to a target, i.e., the farthest 
vertices from which the patroller can sense the target. The algorithm 
can then be easily extended to work on this graph. 

Non-Deterministic Equilibrium Computation In the computation of 
the dominated actions, action enter-when(i, j) is dominated by action 
enter-when(i, k) if it results to be dominated according to the algo- 
rithm provided in Section T4.2.2I and S(j,i) > S(j,k). The mathemat- 
ical programming problems (for both the non-strictly competitive and 
the strictly competitive cases) can be easily modified to capture prob- 
abilistic augmented sensing capabilities. More precisely, the right term 
of constraints ffTUl) and ffTTj) must be multiplied by 1 — S(i,j). This 
does not increase the number of constraints and the number of op- 
timization problems to be solved in the non-strictly competitive case 
and, as shown in [22j, no significant additional computational time is 
needed to address this extension. 

5.1.4- Refinements on Intruder's Movement Model 

We extend the movement model of the intruder. A more detailed descrip- 



tion of this extension can be found in 23 



Model We enrich the model by introducing access areas, which are special 
vertices, constraining the intruder to move along paths connecting ac- 
cess areas to targets, and allowing the patroller to capture the intruder 
also along these paths. As it is often the case in pursuit-evasion, we 
assume that the intruder can move infinitely fast. Therefore, covering 
a path takes only one turn, regardless of its length. The intruder's ac- 
tions are now of the form enter-when(p, i) where p is a path connecting 
an access area to a target. By employing search trees the minimal set 
of paths to be considered can be computed. 
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Deterministic Equilibrium Computation The algorithm is exactly the 
same we described in Section 3 except for the special case in which all 
the possible intruder's paths share a common vertex. In this case, the 
optimal strategy for the patroller is obviously to stay forever in such 
vertex. 

Non-Deterministic Equilibrium Computation In the computation of 
the dominated actions, action enter-when(p, j) is dominated by action 
enter-when(p, k) if it results to be dominated according to the algo- 
rithm provided in Section 14.2.21 and k is not adjacent to any vertex of 
p. The mathematical programming problems (both non-strictly com- 
petitive and strictly competitive) can be easily modified to capture 
intruder's movement along paths. More precisely, the right term of 
constraints (flOj) must consider that all the vertices belonging to p are 
sensed when w = 1. This does not increase the number of constraints 
and the number of optimization problems to be solved in the non- 



strictly competitive case and, as shown in [23J , no significant additional 



computational time is required to address this extension. 

5.1.5. Delay on Intruder's Entering 

We consider the situation wherein there is a delay between the turn at 
which the intruder decides to enter and the turn at which it actually enters. 



A more detailed description of this situation can be found in [22 



Model We introduce a delay D G N + between the turn at which the intruder 
decides to enter a target t and the turn at which it actually enters t. 
A probability distribution can be defined over this delay. 

Deterministic Equilibrium Computation No modification is required 
with respect to the algorithm described in Section 3. 

Non-Deterministic Equilibrium Computation In the computation of 
the dominated actions, we need to add delay D to the penetration 
times of each target. The mathematical programming problems (both 
non-strictly competitive and strictly competitive) must be modified 
as follows. Constraints (fTUj) must be rewritten considering that the 
range of w is {D + 1, ...,£) + d(t)}. We need to introduce constraints 
similar to constraints (11 ip with w in {2, ...,D}. These constraints 
enable to derive an estimate on the position of the patroller at the 
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turn in which the intruder enters. This extension makes the range of w 



longer, introducing additional constraints; however, as shown in [22 
no significant additional computational time is needed to address this 
extension. 

5.1.6. Intruder's Partial Observation over Patroller's Actions 

We consider the situation wherein the intruder, when it decides to enter, 
can partially observe the position of the patroller. A more detailed descrip- 



tion of this situation can be found in 23 



Model As in Section 15.1.44 we enrich the model by introducing access areas 
and constraining the intruder to enter the patrolling setting through 
these access areas. We assign each access area a view over the vertices, 
i.e., the set of vertices that can be observed from such access area. 
Since the intruder cannot perfectly observe the position of the patroller 
when it enters and can enter when the patroller is out of the view of the 
intruder, we need to redefine the intruder's action space. Specifically, 
we define a state s = (j, k) where j is a vertex and k is an integer 
denoting time. The meaning of s is: at the last observation, k turns 
before, the patroller was in vertex j. Thus, the intruder's actions are 
enter-when(i, s), where i is a target and s is a state. 

Deterministic Equilibrium Computation No modification is required 
with respect to the algorithm described in Section 3. 

Non-Deterministic Equilibrium Computation We need to redefine the 
computation of the dominated actions to take into account the possi- 
bility that the intruder enters when the patroller is not observable. In 
principle, the intruder's actions are infinite, being infinite the values 
that k can assume in s. Anyway, it can be shown that only a finite 
number of actions are non dominated. That is, we can safely con- 
sider a finite number of values for k. The mathematical programming 
problems must be modified in a way similar to that of Section [5. 1.51 In- 
deed, to compute the patroller's best strategy when the intruder makes 
enter-when(i, s) with s = (j, k) and k > 0, we need to estimate the 
position of the patroller. This estimation is based on the patroller's 
strategy. As shown in [23], addressing this extension can require sig- 



nificant additional computational time. 
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5.1.7. Multiple Patrolling Robots 

We consider the situation wherein there are multiple patrolling robots. A 
more detailed description of this situation can be found in 21 . 



Model Multiple robots can be easily captured in the model by exploiting 
patroller's augmented sensing capabilities. Let us assume that the en- 
vironment is a perimeter and that the robots move together in a syn- 
chronized fashion (i.e., at each turn they all move clockwise or coun- 
terclockwise, as in [7]). We can define a fictitious single robot with 
an appropriate S(i,j) such that it senses all the vertices that the real 
robots would sense. In the general case, we can model the position 
of the robots as a tuple and their actions as a joint action. The in- 
truder's actions are of the form enter-when(i, . . . ,j g )) where j r is 
the position of the r-th robot. 

Deterministic Equilibrium Computation The algorithm described in Sec- 
tion 3 must be modified as follows. The solution is a tuple of strategies, 
one for each specific robot. That is, the algorithm produces a tuple of 
cycles. At each step of the search tree, the algorithm builds upon the 
current partial solution by adding a target to the strategy of each robot. 
This target will be different for each robot. 

Non-Deterministic Equilibrium Computation The determination of the 
intruder's dominated actions must consider the presence of multiple 
robots. Specifically, the algorithm presented in Section 14.2.21 must be 
repeated for each patrolling robot. An action is dominated if it is dom- 
inated with respect to all the robots. The number of constraints in the 
mathematical programming problems (for both the non-strictly com- 
petitive and the strictly competitive cases) increases since the intruder's 
space of actions is larger. The number of optimization problems to be 
solved in the non-strictly competitive case increases exponentially with 
g (the number of robots). The considerations on computational time 
are similar to those reported in Section 15.1.11 

5.2. Improving Algorithms 

In this section, we discuss how our solving algorithm can be extended 
to situations in which the optimal patrolling strategy does not cover all the 
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targets (Section 15.2. ip and how approximated solutions can be found (Sec- 
tion 15.2.21) . For each extension, we discuss its impact on the computation of 
deterministic (when applicable) and non-deterministic equilibria. 

5.2.1. Computing Non-Full Coverage Strategies 

We consider the situation in which the optimal strategy does not cover 
all the targets and we cannot introduce additional patrolling robots. 

Deterministic Equilibrium Computation In this case, the aim is to de- 
termine the subset of targets to be patrolled such that the patroller's 
utility is the largest. This can be accomplished by removing iteratively 
the targets in increasing order of patroller's utility and searching at 
each iteration for a deterministic equilibrium strategy as prescribed in 
Section 3. The algorithm stops when it finds a subset of targets that 
admits a deterministic equilibrium strategy. 

Non-Deterministic Equilibrium Computation No extension is required 
for the computation of the intruder's dominated actions. The computa- 
tion of the non-deterministic equilibrium strategy can be accomplished 
iteratively similarly to the computation of the deterministic equilib- 
rium strategy. More precisely, initially all the targets are considered. 
If the leader-follower equilibrium strategy is such that some target t is 
not covered, then we apply our algorithm to a new patrolling problem 
where target t is removed. 

5.2.2. Searching for e-Equilibria 

We consider the problem of computing approximate solutions for the 
mathematical programming problems that find the non-deterministic equi- 
librium. 

Non-Deterministic Equilibrium Computation The hardness of the prob- 
lem of searching for an exact leader-follower equilibrium can be relaxed 
by looking at approximated equilibria, called e- equilibria. A strategy 
is an e-equilibrium if each player cannot increase its expected utility 
more than e by making an off-equilibrium action. The literature pro- 
vides several algorithms to compute these equilibria, but only few of 
them are applicable for leader- follower settings, see, e.g., 47]]. Thus, 
this remains an open problem. 
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6. Conclusions 



In this paper we have presented a formal framework for addressing the 
problem of finding optimal patrolling strategies for a mobile robot moving in 
an environment to prevent intrusions. The proposed approach is based on 
the idea of modeling a patrolling situation as a game, played by the patroller 
and the intruder, and of studying its equilibria to derive the optimal pa- 
trolling strategy. The approach is more general than other game theoretical 
approaches presented in literature, since it deals with environments with ar- 
bitrary topology and with arbitrary preferences for the agents. Starting from 
the formal definitions of a patrolling setting and of the associated game, the 
main contributions of the paper have been algorithms for finding equilibrium 
patrolling strategies both in a deterministic and in a non-deterministic form. 
In the first case, the equilibrium patrolling strategy is a fixed path that, when 
followed by the patrolling robot, makes attempting an attack not rational for 
an intruder. In the second case, the equilibrium patrolling strategy is a set 
of probabilities for moving between different positions that, when followed 
by the patrolling robots, maximizes its expected utility. Both the algorithms 
have been analytically studied and experimentally validated to assess their 
properties and efficiency. 

Several avenues for future work have been outlined in Section [51 Some 
preliminary work has been started along a number of these directions, but 
most problems remain open. In addition, a fundamental aspect that is being 
addressed is the application of the theoretical framework presented in this 



paper to real robots [17|, |48] and, more generally, the throwing of a bridge 
between studies of patrolling in the mobile robot community and those in 
the theoretical community. 
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